diff --git "a/data/prs.json" "b/data/prs.json" --- "a/data/prs.json" +++ "b/data/prs.json" @@ -1,29225 +1,28211 @@ [ { - "additions": 18, - "author": "JaredforReal", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? ### Get the rope operation right Before: NeoX split-half style After: GPT-J/interleaved style(`interleaved=True` same as `is_neox_style=Flase`) the right one ### Get rid of `F.relu` Reason: - `F.relu` works with `ac\u2026", - "changed_files": 2, + "additions": 10, + "author": "Abdennacer-Badaoui", + "author_association": "MEMBER", + "body_excerpt": "`_register_model_output_pytree_node` was calling set.__contains__ during TorchDynamo tracing, which is unsupported in PyTorch 2.8.0 (ROCm). Added an early return when `torch.compiler.is_compiling()` is True, since pytree nodes are already\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/45017", - "created_at": "2026-03-26T09:21:10Z", - "deletions": 28, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/45282", + "created_at": "2026-04-07T08:50:54Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/45017/files", - "html_url": "https://github.com/huggingface/transformers/pull/45017", + "files_url": "https://github.com/huggingface/transformers/pull/45282/files", + "html_url": "https://github.com/huggingface/transformers/pull/45282", "labels": [], "merged": false, - "number": 45017, - "review_comments_count": 5, + "number": 45282, + "review_comments_count": 0, "state": "open", - "title": "[WIP][Fix] GLM 5 set `apply_rotary_pos_emb` to `is_neox_style=False` && remove `F.relu()`", - "updated_at": "2026-03-26T10:14:50Z" + "title": "[AMD CI] Fix torch.compile/export failures on AMD CI due to untraceable set.__contains__ ", + "updated_at": "2026-04-07T09:00:34Z" }, { - "additions": 64, - "author": "inisis", + "additions": 6, + "author": "zhang-prog", "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? save locally --> local locally) ```\u2026", - "changed_files": 2, + "additions": 6, + "author": "kallewoof", + "author_association": "CONTRIBUTOR", + "body_excerpt": "Pre-patch unnecessarily breaks merging a LoRA adapter with a model using CUDA_VISIBLE_DEVICES= e.g. when VRAM is insufficient. It also breaks non-cuda machine operations (such as merging). # What does this PR do? This PR un-breaks `CUDA_VI\u2026", + "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44730", - "created_at": "2026-03-15T20:44:32Z", - "deletions": 4, + "comments_count": 5, + "conversation_url": "https://github.com/huggingface/transformers/pull/44980", + "created_at": "2026-03-24T23:50:07Z", + "deletions": 6, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44730/files", - "html_url": "https://github.com/huggingface/transformers/pull/44730", + "files_url": "https://github.com/huggingface/transformers/pull/44980/files", + "html_url": "https://github.com/huggingface/transformers/pull/44980", "labels": [], - "merged": true, - "number": 44730, - "review_comments_count": 6, + "merged": false, + "number": 44980, + "review_comments_count": 0, "state": "closed", - "title": "Fix `mlcd` auto config/model/mapping issues", - "updated_at": "2026-03-16T12:12:30Z" + "title": "bug-fix: do not assume torch.cuda is available when setting up norm values, even if flash linear attention is available", + "updated_at": "2026-03-27T13:25:18Z" }, { - "additions": 214, - "author": "xenova", + "additions": 492, + "author": "michaelbenayoun", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? This PR introduces a helper utility function, `int_div_ceil`, which performs `math.ceil(a / b)` for non-negative integer operands. This is necessary as the current approach is both error-prone and imprecise (especia\u2026", - "changed_files": 58, + "body_excerpt": "# What does this PR do? Introduces `src/transformers/module_fusion.py`, a utility for fusing adjacent submodules in a model into a single FusedModule that executes them as a chain in one forward pass. The key components are: - `RegistryCol\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44729", - "created_at": "2026-03-15T20:29:38Z", - "deletions": 225, + "conversation_url": "https://github.com/huggingface/transformers/pull/44979", + "created_at": "2026-03-24T22:33:31Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44729/files", - "html_url": "https://github.com/huggingface/transformers/pull/44729", + "files_url": "https://github.com/huggingface/transformers/pull/44979/files", + "html_url": "https://github.com/huggingface/transformers/pull/44979", "labels": [], "merged": false, - "number": 44729, + "number": 44979, "review_comments_count": 0, "state": "open", - "title": "Avoid floating point math for ceil operations", - "updated_at": "2026-03-15T20:49:34Z" + "title": "Module Fusion API", + "updated_at": "2026-03-30T19:32:58Z" }, { - "additions": 88, - "author": "ajmeese7", + "additions": 4, + "author": "cjkindel", "author_association": "NONE", - "body_excerpt": "# What does this PR do? Fixes a GPU memory leak in `Bnb4bitQuantize.convert()` where float16 source tensors are never freed during 4-bit quantized model loading via `from_pretrained`, causing OOM on models whose float16 size exceeds GPU VR\u2026", - "changed_files": 2, + "body_excerpt": "# What does this PR do? `_can_set_attn_implementation` and `_can_set_experts_implementation` both do a direct subscript lookup into `sys.modules`: ```python class_module = sys.modules[cls.__module__] ``` If the module is not registered und\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/44728", - "created_at": "2026-03-15T19:56:44Z", - "deletions": 1, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44978", + "created_at": "2026-03-24T21:01:11Z", + "deletions": 4, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44728/files", - "html_url": "https://github.com/huggingface/transformers/pull/44728", - "labels": [], + "files_url": "https://github.com/huggingface/transformers/pull/44978/files", + "html_url": "https://github.com/huggingface/transformers/pull/44978", + "labels": [ + "Code agent slop" + ], "merged": false, - "number": 44728, + "number": 44978, "review_comments_count": 0, "state": "closed", - "title": "Fix float16 memory leak during 4-bit quantized model loading", - "updated_at": "2026-03-16T20:53:54Z" + "title": "fix: handle absent sys.modules entry in modeling_utils", + "updated_at": "2026-03-26T12:25:31Z" }, { - "additions": 202, - "author": "LincolnBurrows2017", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Fixed issue where kwargs like force_download, proxies, token were not being passed to cached_file function.", - "changed_files": 11, + "additions": 2, + "author": "hmellor", + "author_association": "MEMBER", + "body_excerpt": "- Adds a type hint to `ModernVBertForMaskedLM.__init__` - Removes `tie_word_embeddings` from `Qwen2VLTextConfig` (and therefore also `Qwen2_5_VLTextConfig`) because it's not valid for these models - Remove hack from `ColQwen2Config` (and t\u2026", + "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44727", - "created_at": "2026-03-15T19:41:24Z", - "deletions": 33, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44976", + "created_at": "2026-03-24T19:26:33Z", + "deletions": 10, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44727/files", - "html_url": "https://github.com/huggingface/transformers/pull/44727", - "labels": [ - "Code agent slop" - ], - "merged": false, - "number": 44727, - "review_comments_count": 0, + "files_url": "https://github.com/huggingface/transformers/pull/44976/files", + "html_url": "https://github.com/huggingface/transformers/pull/44976", + "labels": [], + "merged": true, + "number": 44976, + "review_comments_count": 3, "state": "closed", - "title": "fix: AutoProcessor.from_pretrained not passing kwargs to cached_file", - "updated_at": "2026-03-18T13:15:46Z" + "title": "Fix tie_word_embedding issues with `Qwen2VL`", + "updated_at": "2026-03-24T20:55:15Z" }, { - "additions": 198, - "author": "LincolnBurrows2017", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Replaced bare except clause with except Exception in _safe_convert_tensor function to follow Python best practices (PEP 8).", - "changed_files": 10, + "additions": 6971, + "author": "philippguevorguian", + "author_association": "NONE", + "body_excerpt": null, + "changed_files": 20, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44725", - "created_at": "2026-03-15T17:41:18Z", - "deletions": 29, + "conversation_url": "https://github.com/huggingface/transformers/pull/44975", + "created_at": "2026-03-24T17:12:31Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44725/files", - "html_url": "https://github.com/huggingface/transformers/pull/44725", - "labels": [ - "Code agent slop" - ], + "files_url": "https://github.com/huggingface/transformers/pull/44975/files", + "html_url": "https://github.com/huggingface/transformers/pull/44975", + "labels": [], "merged": false, - "number": 44725, + "number": 44975, "review_comments_count": 0, "state": "closed", - "title": "fix: replace bare except with Exception in Fuyu image processing", - "updated_at": "2026-03-18T13:16:22Z" + "title": "fix: rebase main; clean config reads, ImageProcessor backend, misc cleanup", + "updated_at": "2026-03-24T17:13:42Z" }, { - "additions": 6, - "author": "ydshieh", + "additions": 1084, + "author": "3outeille", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? TO be explained.", - "changed_files": 5, + "body_excerpt": "TODO: - Saving seems to take a bit of time tho. Need investigation - Need to check if it works in 1D (FSDP or TP)and 2D (FSDP + TP). Running the script from https://github.com/huggingface/transformers/pull/44996 ``` (env_pr-44974-fsdp-core\u2026", + "changed_files": 12, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44724", - "created_at": "2026-03-15T17:14:12Z", - "deletions": 5, - "draft": true, - "files_url": "https://github.com/huggingface/transformers/pull/44724/files", - "html_url": "https://github.com/huggingface/transformers/pull/44724", + "conversation_url": "https://github.com/huggingface/transformers/pull/44974", + "created_at": "2026-03-24T16:13:25Z", + "deletions": 332, + "draft": false, + "files_url": "https://github.com/huggingface/transformers/pull/44974/files", + "html_url": "https://github.com/huggingface/transformers/pull/44974", "labels": [], "merged": false, - "number": 44724, - "review_comments_count": 1, + "number": 44974, + "review_comments_count": 0, "state": "open", - "title": "Fix some missing / incorrect entries in auto files", - "updated_at": "2026-03-16T09:59:56Z" + "title": "Refactor core_model_loading to support FSDP shard-on-read loading", + "updated_at": "2026-03-26T18:04:53Z" }, { - "additions": 12, - "author": "aashirpersonal", - "author_association": "NONE", - "body_excerpt": "## Summary This PR fixes #44716 by exposing and forwarding `interpolate_pos_encoding` through the Pixio embedding/model call chain so the option is actually usable from `PixioModel.forward()`. ### Changes - Added `interpolate_pos_encoding:\u2026", - "changed_files": 2, + "additions": 22, + "author": "andylizf", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## What does this PR do? Adds `.item()` to `max_seqlen = (cu_seqlens[1:] - cu_seqlens[:-1]).max()` in all vision attention modules that pass this value to `flash_attn_varlen_func`. ### Context On **released versions** (e.g. 4.52.4), using\u2026", + "changed_files": 19, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44723", - "created_at": "2026-03-15T16:52:03Z", - "deletions": 6, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44723/files", - "html_url": "https://github.com/huggingface/transformers/pull/44723", - "labels": [ - "Code agent slop" - ], - "merged": false, - "number": 44723, - "review_comments_count": 0, - "state": "closed", - "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", - "updated_at": "2026-03-18T15:05:52Z" - }, - { - "additions": 38, - "author": "chandan11248", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## What does this PR do? Migrates the GPT-J model to use the new `@capture_outputs` and `@can_return_tuple` decorators for standardized output collection, as described in #43979. ### Changes - Added `_can_record_outputs` to `GPTJPreTrained\u2026", - "changed_files": 2, - "cluster_id": "cluster-43979-28", - "cluster_ids": [ - "cluster-43979-28" - ], - "cluster_role": "member", - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44722", - "created_at": "2026-03-15T15:33:25Z", - "deletions": 110, + "conversation_url": "https://github.com/huggingface/transformers/pull/44973", + "created_at": "2026-03-24T15:42:32Z", + "deletions": 22, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44722/files", - "html_url": "https://github.com/huggingface/transformers/pull/44722", + "files_url": "https://github.com/huggingface/transformers/pull/44973/files", + "html_url": "https://github.com/huggingface/transformers/pull/44973", "labels": [], "merged": false, - "number": 44722, + "number": 44973, "review_comments_count": 0, "state": "open", - "title": "Refactor gptj output tracing to use standardized decorators", - "updated_at": "2026-03-19T18:12:59Z" + "title": "Fix max_seqlen type in vision attention for torch.compile + FA2", + "updated_at": "2026-03-25T14:12:50Z" }, { - "additions": 4, - "author": "rsmed31", - "author_association": "NONE", - "body_excerpt": "## Summary Fixes #44716 `PixioPatchEmbeddings.forward` already accepted `interpolate_pos_encoding` but it was silently dropped \u2014 never passed from `PixioEmbeddings.forward` or `PixioModel.forward`, making the parameter effectively unusable\u2026", - "changed_files": 1, + "additions": 17, + "author": "Abdennacer-Badaoui", + "author_association": "MEMBER", + "body_excerpt": "As per title. Updating Gemma3/Gemma3n expectations.", + "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44718", - "created_at": "2026-03-14T23:57:14Z", - "deletions": 3, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44972", + "created_at": "2026-03-24T15:11:50Z", + "deletions": 12, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44718/files", - "html_url": "https://github.com/huggingface/transformers/pull/44718", + "files_url": "https://github.com/huggingface/transformers/pull/44972/files", + "html_url": "https://github.com/huggingface/transformers/pull/44972", "labels": [], - "merged": false, - "number": 44718, - "review_comments_count": 0, + "merged": true, + "number": 44972, + "review_comments_count": 10, "state": "closed", - "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", - "updated_at": "2026-03-15T17:58:58Z" + "title": "[AMD CI] Gemma3/Gemma3n Expectations", + "updated_at": "2026-03-24T16:30:03Z" }, { - "additions": 15, - "author": "ydshieh", + "additions": 0, + "author": "ArthurZucker", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? As discussed internally, some component model classes didn't specify the correct config classes. This PR fixes them (those I could found - because the tiny model creation script fails due to those mistakes).", - "changed_files": 7, + "body_excerpt": "# What does this PR do? Removed the tokenizer_class attr was never there to begin with, and kwargs are now supported. This was failing some test on vllm ci. Fixes https://buildkite.com/vllm/ci/builds/57601/steps/canvas?sid=019d1aec-aa5a-41\u2026", + "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/44715", - "created_at": "2026-03-14T21:11:52Z", - "deletions": 2, + "comments_count": 3, + "conversation_url": "https://github.com/huggingface/transformers/pull/44971", + "created_at": "2026-03-24T14:59:36Z", + "deletions": 11, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44715/files", - "html_url": "https://github.com/huggingface/transformers/pull/44715", + "files_url": "https://github.com/huggingface/transformers/pull/44971/files", + "html_url": "https://github.com/huggingface/transformers/pull/44971", "labels": [], "merged": true, - "number": 44715, - "review_comments_count": 0, + "number": 44971, + "review_comments_count": 1, "state": "closed", - "title": "Fix missing / incorrect `config` class in some model class definitions", - "updated_at": "2026-03-15T11:19:51Z" + "title": "[ `vllm x v5`] nit", + "updated_at": "2026-03-24T17:40:05Z" }, { - "additions": 181, - "author": "LincolnBurrows2017", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating from core config to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but `text_config` still has default\u2026", - "changed_files": 8, + "additions": 20, + "author": "IlyasMoutawwakil", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? save locally --> local locally) ```\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44494", - "created_at": "2026-03-06T12:57:25Z", - "deletions": 11, + "comments_count": 3, + "conversation_url": "https://github.com/huggingface/transformers/pull/44730", + "created_at": "2026-03-15T20:44:32Z", + "deletions": 4, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44494/files", - "html_url": "https://github.com/huggingface/transformers/pull/44494", + "files_url": "https://github.com/huggingface/transformers/pull/44730/files", + "html_url": "https://github.com/huggingface/transformers/pull/44730", "labels": [], "merged": true, - "number": 44494, - "review_comments_count": 3, + "number": 44730, + "review_comments_count": 6, "state": "closed", - "title": "Update `ty` to 0.0.20", - "updated_at": "2026-03-06T13:30:25Z" + "title": "Fix `mlcd` auto config/model/mapping issues", + "updated_at": "2026-03-16T12:12:30Z" }, { - "additions": 439, - "author": "SunMarc", + "additions": 214, + "author": "xenova", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Since I removed some folders (fsdp, deepspeed) related to training, I need to modify the workflows !", - "changed_files": 18, + "body_excerpt": "# What does this PR do? This PR introduces a helper utility function, `int_div_ceil`, which performs `math.ceil(a / b)` for non-negative integer operands. This is necessary as the current approach is both error-prone and imprecise (especia\u2026", + "changed_files": 58, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44491", - "created_at": "2026-03-06T11:15:42Z", - "deletions": 647, + "comments_count": 3, + "conversation_url": "https://github.com/huggingface/transformers/pull/44729", + "created_at": "2026-03-15T20:29:38Z", + "deletions": 225, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44491/files", - "html_url": "https://github.com/huggingface/transformers/pull/44491", + "files_url": "https://github.com/huggingface/transformers/pull/44729/files", + "html_url": "https://github.com/huggingface/transformers/pull/44729", "labels": [], - "merged": true, - "number": 44491, - "review_comments_count": 3, - "state": "closed", - "title": "Fix training ci and clean some tests", - "updated_at": "2026-03-11T16:27:57Z" + "merged": false, + "number": 44729, + "review_comments_count": 0, + "state": "open", + "title": "Avoid floating point math for ceil operations", + "updated_at": "2026-03-15T20:49:34Z" }, { - "additions": 4, - "author": "kaixuanliu", - "author_association": "CONTRIBUTOR", - "body_excerpt": "@ArthurZucker @Cyrilvallez pls help review, thx! This PR fixes failed test case: `pytest -rA tests/models/eurobert/test_modeling_eurobert.py::EuroBertModelTest::test_model_parallelism`", + "additions": 88, + "author": "ajmeese7", + "author_association": "NONE", + "body_excerpt": "# What does this PR do? Fixes a GPU memory leak in `Bnb4bitQuantize.convert()` where float16 source tensors are never freed during 4-bit quantized model loading via `from_pretrained`, causing OOM on models whose float16 size exceeds GPU VR\u2026", "changed_files": 2, - "cluster_id": "cluster-43324-12", - "cluster_ids": [ - "cluster-43324-12" - ], - "cluster_role": "member", - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44490", - "created_at": "2026-03-06T10:56:48Z", - "deletions": 0, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 4, + "conversation_url": "https://github.com/huggingface/transformers/pull/44728", + "created_at": "2026-03-15T19:56:44Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44490/files", - "html_url": "https://github.com/huggingface/transformers/pull/44490", + "files_url": "https://github.com/huggingface/transformers/pull/44728/files", + "html_url": "https://github.com/huggingface/transformers/pull/44728", "labels": [], - "merged": true, - "number": 44490, + "merged": false, + "number": 44728, "review_comments_count": 0, "state": "closed", - "title": "fix model parallelism bug for eurobert model", - "updated_at": "2026-03-06T14:16:41Z" + "title": "Fix float16 memory leak during 4-bit quantized model loading", + "updated_at": "2026-03-16T20:53:54Z" }, { - "additions": 310, - "author": "tarekziade", - "author_association": "MEMBER", - "body_excerpt": "This PR makes `.ai` the single source of truth for agent templates and skills, and adds explicit `Makefile` targets to generate `Codex` and `Claude Code` specific artifacts. It contains a first skill aimed at properly dealing with typing e\u2026", - "changed_files": 7, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 6, - "conversation_url": "https://github.com/huggingface/transformers/pull/44489", - "created_at": "2026-03-06T08:42:12Z", - "deletions": 62, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44489/files", - "html_url": "https://github.com/huggingface/transformers/pull/44489", - "labels": [], - "merged": true, - "number": 44489, - "review_comments_count": 2, - "state": "closed", - "title": "Centralize AI agent templates in `.ai`", - "updated_at": "2026-03-18T14:17:22Z" - }, - { - "additions": 482, - "author": "abhijeet-dhumal", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? Fixes #44486 Adds `KubeflowCallback` to enable automatic progress and metrics reporting for training jobs running on [Kubeflow Trainer](https://github.com/kubeflow/trainer). When training runs inside a Kubeflow Trai\u2026", - "changed_files": 6, + "additions": 202, + "author": "LincolnBurrows2017", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "Fixed issue where kwargs like force_download, proxies, token were not being passed to cached_file function.", + "changed_files": 11, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44487", - "created_at": "2026-03-06T08:31:30Z", - "deletions": 1, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44727", + "created_at": "2026-03-15T19:41:24Z", + "deletions": 33, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44487/files", - "html_url": "https://github.com/huggingface/transformers/pull/44487", - "labels": [], - "merged": true, - "number": 44487, - "review_comments_count": 8, + "files_url": "https://github.com/huggingface/transformers/pull/44727/files", + "html_url": "https://github.com/huggingface/transformers/pull/44727", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44727, + "review_comments_count": 0, "state": "closed", - "title": "feat(integration): Add KubeflowCallback to enable automatic progress \u2026", - "updated_at": "2026-03-18T14:58:23Z" + "title": "fix: AutoProcessor.from_pretrained not passing kwargs to cached_file", + "updated_at": "2026-03-18T13:15:46Z" }, { - "additions": 691, - "author": "kaixuanliu", - "author_association": "CONTRIBUTOR", - "body_excerpt": "@IlyasMoutawwakil pls help review, thx!", - "changed_files": 1, + "additions": 198, + "author": "LincolnBurrows2017", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "Replaced bare except clause with except Exception in _safe_convert_tensor function to follow Python best practices (PEP 8).", + "changed_files": 10, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44482", - "created_at": "2026-03-06T02:39:41Z", - "deletions": 332, + "conversation_url": "https://github.com/huggingface/transformers/pull/44725", + "created_at": "2026-03-15T17:41:18Z", + "deletions": 29, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44482/files", - "html_url": "https://github.com/huggingface/transformers/pull/44482", - "labels": [], - "merged": true, - "number": 44482, + "files_url": "https://github.com/huggingface/transformers/pull/44725/files", + "html_url": "https://github.com/huggingface/transformers/pull/44725", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44725, "review_comments_count": 0, "state": "closed", - "title": "add XPU Expectations for higgs_audio_v2 tests", - "updated_at": "2026-03-10T08:38:56Z" + "title": "fix: replace bare except with Exception in Fuyu image processing", + "updated_at": "2026-03-18T13:16:22Z" }, { - "additions": 2353, - "author": "XingyuHu109", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## Summary This PR adds native Transformers support for DeepSeek-V3.2. It introduces a new `deepseek_v32` model family so the official checkpoints resolve through the standard auto classes without `trust_remote_code`. The implementation ke\u2026", - "changed_files": 19, + "additions": 6, + "author": "ydshieh", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? TO be explained.", + "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44481", - "created_at": "2026-03-05T21:14:38Z", - "deletions": 30, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44481/files", - "html_url": "https://github.com/huggingface/transformers/pull/44481", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44724", + "created_at": "2026-03-15T17:14:12Z", + "deletions": 5, + "draft": true, + "files_url": "https://github.com/huggingface/transformers/pull/44724/files", + "html_url": "https://github.com/huggingface/transformers/pull/44724", "labels": [], "merged": false, - "number": 44481, - "review_comments_count": 4, + "number": 44724, + "review_comments_count": 1, "state": "open", - "title": "Add native DeepSeek-V3.2 support", - "updated_at": "2026-03-12T16:02:46Z" + "title": "Fix some missing / incorrect entries in auto files", + "updated_at": "2026-03-16T09:59:56Z" }, { - "additions": 3, - "author": "ydshieh", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? add `diffusers` to docker file for `VibeVoice` (added in PR #40546).", - "changed_files": 1, + "additions": 12, + "author": "aashirpersonal", + "author_association": "NONE", + "body_excerpt": "## Summary This PR fixes #44716 by exposing and forwarding `interpolate_pos_encoding` through the Pixio embedding/model call chain so the option is actually usable from `PixioModel.forward()`. ### Changes - Added `interpolate_pos_encoding:\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44480", - "created_at": "2026-03-05T20:54:07Z", - "deletions": 0, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44723", + "created_at": "2026-03-15T16:52:03Z", + "deletions": 6, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44480/files", - "html_url": "https://github.com/huggingface/transformers/pull/44480", - "labels": [], - "merged": true, - "number": 44480, + "files_url": "https://github.com/huggingface/transformers/pull/44723/files", + "html_url": "https://github.com/huggingface/transformers/pull/44723", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44723, "review_comments_count": 0, "state": "closed", - "title": "Add `diffusers` to CI docker file", - "updated_at": "2026-03-05T21:11:17Z" + "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", + "updated_at": "2026-03-18T15:05:52Z" }, { - "additions": 116, - "author": "BenjaminBossan", - "author_association": "MEMBER", - "body_excerpt": "Required fixes: - some code was using unordered data structures, making weight order random - adjust alpha to offset increased rank from fusion - import functions from PEFT if available See https://github.com/huggingface/peft/pull/3083.", - "changed_files": 4, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44478", - "created_at": "2026-03-05T17:19:31Z", - "deletions": 26, + "additions": 38, + "author": "chandan11248", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## What does this PR do? Migrates the GPT-J model to use the new `@capture_outputs` and `@can_return_tuple` decorators for standardized output collection, as described in #43979. ### Changes - Added `_can_record_outputs` to `GPTJPreTrained\u2026", + "changed_files": 2, + "cluster_id": "cluster-43979-28", + "cluster_ids": [ + "cluster-43979-28" + ], + "cluster_role": "member", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44722", + "created_at": "2026-03-15T15:33:25Z", + "deletions": 110, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44478/files", - "html_url": "https://github.com/huggingface/transformers/pull/44478", + "files_url": "https://github.com/huggingface/transformers/pull/44722/files", + "html_url": "https://github.com/huggingface/transformers/pull/44722", "labels": [], - "merged": true, - "number": 44478, - "review_comments_count": 1, - "state": "closed", - "title": "[WIP] FIX Make Mixtral LoRA loading work", - "updated_at": "2026-03-11T17:44:20Z" + "merged": false, + "number": 44722, + "review_comments_count": 0, + "state": "open", + "title": "Refactor gptj output tracing to use standardized decorators", + "updated_at": "2026-03-19T18:12:59Z" }, { - "additions": 1, - "author": "Cyrilvallez", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? As per the title. It's quite a random rule to fix https://huggingface.co/fixie-ai/ultravox-v0_5-llama-3_2-1b to be honest", + "additions": 4, + "author": "rsmed31", + "author_association": "NONE", + "body_excerpt": "## Summary Fixes #44716 `PixioPatchEmbeddings.forward` already accepted `interpolate_pos_encoding` but it was silently dropped \u2014 never passed from `PixioEmbeddings.forward` or `PixioModel.forward`, making the parameter effectively unusable\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44477", - "created_at": "2026-03-05T16:58:29Z", - "deletions": 0, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44718", + "created_at": "2026-03-14T23:57:14Z", + "deletions": 3, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44477/files", - "html_url": "https://github.com/huggingface/transformers/pull/44477", + "files_url": "https://github.com/huggingface/transformers/pull/44718/files", + "html_url": "https://github.com/huggingface/transformers/pull/44718", "labels": [], "merged": false, - "number": 44477, + "number": 44718, "review_comments_count": 0, "state": "closed", - "title": "[vllm compat] Fix remote code inits", - "updated_at": "2026-03-11T10:34:06Z" + "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", + "updated_at": "2026-03-15T17:58:58Z" }, { - "additions": 4, - "author": "Rocketknight1", + "additions": 15, + "author": "ydshieh", "author_association": "MEMBER", - "body_excerpt": "I made an oversight in the fix at #43981 - I didn't realize the dim order changed for torch, so the test was still flaky for torch tensors. The fix reduced the flaky frequency a lot so I thought it had been fixed, but actually it's still t\u2026", - "changed_files": 1, + "body_excerpt": "# What does this PR do? As discussed internally, some component model classes didn't specify the correct config classes. This PR fixes them (those I could found - because the tiny model creation script fails due to those mistakes).", + "changed_files": 7, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44476", - "created_at": "2026-03-05T16:39:44Z", + "comments_count": 4, + "conversation_url": "https://github.com/huggingface/transformers/pull/44715", + "created_at": "2026-03-14T21:11:52Z", "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44476/files", - "html_url": "https://github.com/huggingface/transformers/pull/44476", + "files_url": "https://github.com/huggingface/transformers/pull/44715/files", + "html_url": "https://github.com/huggingface/transformers/pull/44715", "labels": [], "merged": true, - "number": 44476, + "number": 44715, "review_comments_count": 0, "state": "closed", - "title": "Fix Llava tests for torch too!", - "updated_at": "2026-03-11T16:47:05Z" + "title": "Fix missing / incorrect `config` class in some model class definitions", + "updated_at": "2026-03-15T11:19:51Z" }, { - "additions": 1, - "author": "itazap", - "author_association": "MEMBER", - "body_excerpt": "chameleon added to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS", - "changed_files": 1, + "additions": 181, + "author": "LincolnBurrows2017", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating from core config to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but `text_config` still has default\u2026", + "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44475", - "created_at": "2026-03-05T16:29:18Z", - "deletions": 0, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44714", + "created_at": "2026-03-14T20:42:46Z", + "deletions": 26, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44475/files", - "html_url": "https://github.com/huggingface/transformers/pull/44475", + "files_url": "https://github.com/huggingface/transformers/pull/44714/files", + "html_url": "https://github.com/huggingface/transformers/pull/44714", "labels": [], - "merged": true, - "number": 44475, + "merged": false, + "number": 44714, "review_comments_count": 0, "state": "closed", - "title": "chameleon added to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS", - "updated_at": "2026-03-09T22:33:20Z" + "title": "fix: propagate num_labels to text_config for Qwen models", + "updated_at": "2026-03-18T12:56:27Z" }, { - "additions": 875, - "author": "JJJYmmm", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? Fix https://github.com/QwenLM/Qwen3.5/issues/58. In the latest code, Qwen3VL and Qwen3.5 use the same `get_rope_index` func of Qwen2VL. But they should be different since Qwen3VL/Qwen3.5 introduce text timestamps. T\u2026", - "changed_files": 9, + "additions": 15, + "author": "kulkarni-rohan", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "Applies the output tracing refactor to ColQwen2ForRetrieval as part of the broader effort tracked in issue #43979 to modernize output handling across all models in the library. Changes in both modular_colqwen2.py and modeling_colqwen2.py:\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 10, - "conversation_url": "https://github.com/huggingface/transformers/pull/44474", - "created_at": "2026-03-05T15:46:09Z", - "deletions": 107, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44713", + "created_at": "2026-03-14T20:20:14Z", + "deletions": 28, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44474/files", - "html_url": "https://github.com/huggingface/transformers/pull/44474", + "files_url": "https://github.com/huggingface/transformers/pull/44713/files", + "html_url": "https://github.com/huggingface/transformers/pull/44713", "labels": [], - "merged": true, - "number": 44474, - "review_comments_count": 10, - "state": "closed", - "title": "[Bugfix] fix video inference of qwen3vl and qwen3.5 series", - "updated_at": "2026-03-10T09:52:44Z" + "merged": false, + "number": 44713, + "review_comments_count": 0, + "state": "open", + "title": "[ColQwen2] Refactor output tracing (issue #43979)", + "updated_at": "2026-03-14T20:21:24Z" }, { - "additions": 137, - "author": "winglian", - "author_association": "COLLABORATOR", - "body_excerpt": "# What does this PR do? supersedes #44446 on `main`, when loading to cpu and using meta devices for non-rank0 processes, it now re-initializes weights on those processes as well as uses more CPU memory. In testing with loading llama3-8b. m\u2026", - "changed_files": 2, + "additions": 2, + "author": "ydshieh", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? torch 2.11 is going to be released soon, but we still use 2.9. Let's update it to 2.10 so at least a run with torch 2.10, before we update to torch 2.11 later.", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44473", - "created_at": "2026-03-05T14:52:15Z", - "deletions": 1, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44712", + "created_at": "2026-03-14T20:18:01Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44473/files", - "html_url": "https://github.com/huggingface/transformers/pull/44473", + "files_url": "https://github.com/huggingface/transformers/pull/44712/files", + "html_url": "https://github.com/huggingface/transformers/pull/44712", "labels": [], "merged": true, - "number": 44473, - "review_comments_count": 4, + "number": 44712, + "review_comments_count": 0, "state": "closed", - "title": "fix FSDP loading with meta devices", - "updated_at": "2026-03-09T15:46:22Z" + "title": "Update Nvidia CI docker file to use torch 2.10", + "updated_at": "2026-03-14T20:29:04Z" }, { - "additions": 13, - "author": "jblox26", + "additions": 339, + "author": "anuq", "author_association": "NONE", - "body_excerpt": "## What does this fix? Running video inference with any `Qwen3VL` model raises `StopIteration` during `model.generate()`: ``` File \".../transformers/models/qwen3_vl/modeling_qwen3_vl.py\", line 1126, in get_rope_index grid_thw = next(grid_i\u2026", + "body_excerpt": "## What does this PR do? Fixes #35141. When `tie_word_embeddings=False`, calling `resize_token_embeddings()` creates a new `nn.Linear` for the LM head via `_get_resized_lm_head()`. The new module's weight and bias tensors do **not** carry\u2026", + "changed_files": 4, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 3, + "conversation_url": "https://github.com/huggingface/transformers/pull/44711", + "created_at": "2026-03-14T19:21:21Z", + "deletions": 205, + "draft": false, + "files_url": "https://github.com/huggingface/transformers/pull/44711/files", + "html_url": "https://github.com/huggingface/transformers/pull/44711", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44711, + "review_comments_count": 0, + "state": "closed", + "title": "fix: mark new lm_head params as `_is_hf_initialized` after `resize_token_embeddings`", + "updated_at": "2026-03-20T13:36:58Z" + }, + { + "additions": 12, + "author": "he-yufeng", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## What does this PR do? Fixes `AutoProcessor.from_pretrained` silently dropping hub kwargs like `force_download`, `cache_dir`, `token`, `revision`, etc. ### The bug The existing code on line ~300 filters kwargs using `inspect.signature(ca\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/44472", - "created_at": "2026-03-05T14:50:06Z", - "deletions": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44710", + "created_at": "2026-03-14T18:33:53Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44472/files", - "html_url": "https://github.com/huggingface/transformers/pull/44472", + "files_url": "https://github.com/huggingface/transformers/pull/44710/files", + "html_url": "https://github.com/huggingface/transformers/pull/44710", "labels": [], - "merged": false, - "number": 44472, + "merged": true, + "number": 44710, "review_comments_count": 0, "state": "closed", - "title": "Fix Qwen3VL get_rope_index StopIteration with per-frame video tokens", - "updated_at": "2026-03-06T15:15:58Z" + "title": "Fix AutoProcessor.from_pretrained silently dropping hub kwargs", + "updated_at": "2026-03-25T18:13:14Z" }, { - "additions": 50, - "author": "weiguangli-io", - "author_association": "CONTRIBUTOR", - "body_excerpt": "## What does this PR do? Fixes #44466 After `.to(device)`, PyTorch's `Module._apply` may create new `Parameter` objects that no longer share storage with tied weights. This caused `remove_tied_weights_from_state_dict` to fail to detect and\u2026", - "changed_files": 2, + "additions": 6778, + "author": "LucasMa2025", + "author_association": "FIRST_TIMER", + "body_excerpt": "# \ud83c\udf9b\ufe0f Add Configurable Generation Scheduler and State Machine for `generate()` ## Summary This PR introduces a **fully optional, zero-intrusion** Generation Scheduler (`GenerationScheduler`) and explicit state machine (`GenerationStateMachi\u2026", + "changed_files": 15, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44471", - "created_at": "2026-03-05T14:30:17Z", + "conversation_url": "https://github.com/huggingface/transformers/pull/44708", + "created_at": "2026-03-14T17:13:34Z", + "deletions": 7, + "draft": true, + "files_url": "https://github.com/huggingface/transformers/pull/44708/files", + "html_url": "https://github.com/huggingface/transformers/pull/44708", + "labels": [], + "merged": false, + "number": 44708, + "review_comments_count": 0, + "state": "closed", + "title": "Add Configurable Generation Scheduler and State Machine for `generate()`", + "updated_at": "2026-03-14T19:19:11Z" + }, + { + "additions": 3, + "author": "saivedant169", + "author_association": "NONE", + "body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `MptForCausalLM.forward()` and `MptModel.forward()`, bringing MPT in line with other CausalLM models. Same rationale as the Bloom PR (#44706) \u2014 M\u2026", + "changed_files": 1, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44707", + "created_at": "2026-03-14T17:12:16Z", "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44471/files", - "html_url": "https://github.com/huggingface/transformers/pull/44471", + "files_url": "https://github.com/huggingface/transformers/pull/44707/files", + "html_url": "https://github.com/huggingface/transformers/pull/44707", "labels": [ "Code agent slop" ], "merged": false, - "number": 44471, + "number": 44707, "review_comments_count": 0, "state": "closed", - "title": "Fix tied weights serialization being device-dependent", - "updated_at": "2026-03-06T14:03:18Z" + "title": "Add position_ids to MptForCausalLM forward pass", + "updated_at": "2026-03-18T13:39:36Z" }, { - "additions": 8, - "author": "weiguangli-io", - "author_association": "CONTRIBUTOR", - "body_excerpt": "Fixes #44360 The reference `fp8_index` kernel clamps per-head q\u00b7k scores with `T.max(logits, 0)` before the weighted sum across heads ([kernel.py#L241](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/inference/kernel.py#L241\u2026", - "changed_files": 2, + "additions": 3, + "author": "saivedant169", + "author_association": "NONE", + "body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `BloomForCausalLM.forward()` and `BloomModel.forward()`, bringing Bloom in line with other CausalLM models like Llama, Falcon, Gemma, and Mistral\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44470", - "created_at": "2026-03-05T14:02:05Z", + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44706", + "created_at": "2026-03-14T17:09:11Z", "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44470/files", - "html_url": "https://github.com/huggingface/transformers/pull/44470", + "files_url": "https://github.com/huggingface/transformers/pull/44706/files", + "html_url": "https://github.com/huggingface/transformers/pull/44706", "labels": [ "Code agent slop" ], "merged": false, - "number": 44470, + "number": 44706, "review_comments_count": 0, "state": "closed", - "title": "Add missing ReLU in GlmMoeDsaIndexer", - "updated_at": "2026-03-05T15:39:38Z" + "title": "Add position_ids to BloomForCausalLM forward pass", + "updated_at": "2026-03-18T13:39:51Z" }, { - "additions": 4, - "author": "Cyrilvallez", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? For remote code that behave correctly with tied weights, we need to keep the same behavior as for the main lib, i.e. not remove them from tied weights (as tied weights are marked as missing to avoid inits!!)", + "additions": 14, + "author": "saivedant169", + "author_association": "NONE", + "body_excerpt": "Fixes part of #32937 ## What does this PR do? RoFormer introduced rotary position embeddings, but its `ForCausalLM` forward method doesn't accept `position_ids` \u2014 which means callers can't specify custom positions for packed sequences or f\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44469", - "created_at": "2026-03-05T13:51:55Z", - "deletions": 2, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44705", + "created_at": "2026-03-14T16:48:06Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44469/files", - "html_url": "https://github.com/huggingface/transformers/pull/44469", - "labels": [], - "merged": true, - "number": 44469, + "files_url": "https://github.com/huggingface/transformers/pull/44705/files", + "html_url": "https://github.com/huggingface/transformers/pull/44705", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44705, "review_comments_count": 0, "state": "closed", - "title": "[remote code/vllm] Fix incorrect tied weights", - "updated_at": "2026-03-05T15:07:56Z" + "title": "Add position_ids to RoFormerForCausalLM forward pass", + "updated_at": "2026-03-18T13:40:05Z" }, { - "additions": 13, - "author": "itazap", + "additions": 26, + "author": "vasqu", "author_association": "MEMBER", - "body_excerpt": "Replace placeholder tokens as specified in added_tokens_decoder if we have added_tokens_decoder with specific token_ids, we need to overwrite them in spm model ! example: [UNUSED_TOKEN_146] -> <|im_start|> see internlm2: https://huggingfac\u2026", - "changed_files": 1, + "body_excerpt": "As per title, it seems that the `cute` subfolder can be even distributed if you only install FA2 which implies something wrong. Now we check under the (normalized) distribution names", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44468", - "created_at": "2026-03-05T13:48:56Z", - "deletions": 0, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44703", + "created_at": "2026-03-14T14:46:02Z", + "deletions": 10, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44468/files", - "html_url": "https://github.com/huggingface/transformers/pull/44468", + "files_url": "https://github.com/huggingface/transformers/pull/44703/files", + "html_url": "https://github.com/huggingface/transformers/pull/44703", "labels": [], "merged": true, - "number": 44468, - "review_comments_count": 0, + "number": 44703, + "review_comments_count": 1, "state": "closed", - "title": "Replace placeholder tokens as specified in added_tokens_decoder", - "updated_at": "2026-03-05T16:29:13Z" + "title": "[`FA`] Fix fa detection", + "updated_at": "2026-03-14T17:19:07Z" }, { - "additions": 346, - "author": "itazap", - "author_association": "MEMBER", - "body_excerpt": "Replace placeholder tokens as specified in added_tokens_decoder if we have `added_tokens_decoder` with specific token_ids, we need to overwrite them in spm model ! `example: [UNUSED_TOKEN_146] -> <|im_start|>` see internlm2: https://huggin\u2026", - "changed_files": 24, + "additions": 148, + "author": "LincolnBurrows2017", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## What does this PR fix? The `rms_norm_eps` parameter in `MistralConfig` was incorrectly typed as `int | None` but defaults to `1e-6` which is a float. This parameter is passed to `MistralRMSNorm` which expects `eps: float`. ### Bug Detai\u2026", + "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44467", - "created_at": "2026-03-05T13:44:54Z", - "deletions": 204, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44702", + "created_at": "2026-03-14T14:41:15Z", + "deletions": 25, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44467/files", - "html_url": "https://github.com/huggingface/transformers/pull/44467", - "labels": [], + "files_url": "https://github.com/huggingface/transformers/pull/44702/files", + "html_url": "https://github.com/huggingface/transformers/pull/44702", + "labels": [ + "Code agent slop" + ], "merged": false, - "number": 44467, + "number": 44702, "review_comments_count": 0, - "state": "open", - "title": "Placeholder tokens update", - "updated_at": "2026-03-05T13:47:28Z" + "state": "closed", + "title": "fix: Correct rms_norm_eps type hint from int to float in MistralConfig", + "updated_at": "2026-03-18T13:00:12Z" }, { - "additions": 20, - "author": "kashif", + "additions": 219, + "author": "hmellor", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Fix the loss calculation; we should calculate it on scaled targets. modular doesn't properly convert some files (e.g. kyutai) Also fixes red CI on main", + "author": "jnMetaCode", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## Summary Fixes a `KeyError` crash in `_parse_type_hint` in `chat_template_utils.py` (line 117). When processing Union types, the code accesses `subtype[\"type\"]` without checking the key exists. `_get_json_schema_type(Any)` returns `{}` (\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44283", - "created_at": "2026-02-25T18:33:17Z", + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44525", + "created_at": "2026-03-08T09:21:27Z", "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44283/files", - "html_url": "https://github.com/huggingface/transformers/pull/44283", + "files_url": "https://github.com/huggingface/transformers/pull/44525/files", + "html_url": "https://github.com/huggingface/transformers/pull/44525", "labels": [], "merged": true, - "number": 44283, + "number": 44525, "review_comments_count": 0, "state": "closed", - "title": "[`Modular`] Fix file type regression", - "updated_at": "2026-02-25T20:04:41Z" + "title": "Fix KeyError in _parse_type_hint when Union contains Any", + "updated_at": "2026-03-09T13:43:23Z" }, { - "additions": 5, - "author": "Rocketknight1", - "author_association": "MEMBER", - "body_excerpt": "Response schema save-loading was broken in #40936, this PR restores it! I did most of this in #42300 but missed an issue with loading/saving.", + "additions": 1, + "author": "jnMetaCode", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## Summary Fixes a bug in `AssistantTracker.is_active()` in `chat_template_utils.py`. After activation via `activate_tracker()`, `_rendered_blocks` and `_generation_indices` are set to list arguments which may be empty `[]`. The `is_active\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44282", - "created_at": "2026-02-25T17:57:54Z", - "deletions": 0, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44524", + "created_at": "2026-03-08T09:21:25Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44282/files", - "html_url": "https://github.com/huggingface/transformers/pull/44282", + "files_url": "https://github.com/huggingface/transformers/pull/44524/files", + "html_url": "https://github.com/huggingface/transformers/pull/44524", "labels": [], "merged": true, - "number": 44282, + "number": 44524, "review_comments_count": 0, "state": "closed", - "title": "Restore response_schema saving-loading", - "updated_at": "2026-02-25T18:27:22Z" + "title": "Fix AssistantTracker.is_active() returning False after activation with empty lists", + "updated_at": "2026-03-09T13:36:19Z" }, { - "additions": 1, - "author": "ArthurZucker", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Its a very small fix for #44062", + "additions": 2, + "author": "jnMetaCode", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## Summary Fixes two small bugs in `load_sharded_checkpoint` in `trainer_utils.py`: **Bug 1 \u2014 Copy-paste error in error message (line 1108):** When reporting unexpected keys, the error message incorrectly says \"Missing key(s)\" instead of \"\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44281", - "created_at": "2026-02-25T16:28:37Z", - "deletions": 0, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44523", + "created_at": "2026-03-08T09:21:22Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44281/files", - "html_url": "https://github.com/huggingface/transformers/pull/44281", + "files_url": "https://github.com/huggingface/transformers/pull/44523/files", + "html_url": "https://github.com/huggingface/transformers/pull/44523", "labels": [], "merged": true, - "number": 44281, + "number": 44523, "review_comments_count": 0, "state": "closed", - "title": "Fix special token maps BC", - "updated_at": "2026-02-26T10:34:17Z" + "title": "Fix error message label and docstring default in load_sharded_checkpoint", + "updated_at": "2026-03-10T15:48:41Z" }, { - "additions": 614, - "author": "RishabhMehra", - "author_association": "FIRST_TIMER", - "body_excerpt": "# What does this PR do? - Adds an opt-in use_fast_grouping flag to TokenClassificationPipeline to enable a NumPy-vectorised BIO grouping path (~5\u00d7 faster on long sequences) while keeping the legacy path as default. - Improves correctness:\u2026", - "changed_files": 3, - "cluster_id": null, + "additions": 41, + "author": "nakigami", + "author_association": "NONE", + "body_excerpt": "# What does this PR do? This PR introduces initial unit test coverage for the `transformers-cli` tool, specifically focusing on diagnostic and model utility commands. Currently, these CLI entry points lack automated tests. These new tests\u2026", + "changed_files": 1, + "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44278", - "created_at": "2026-02-25T12:49:56Z", - "deletions": 63, + "conversation_url": "https://github.com/huggingface/transformers/pull/44520", + "created_at": "2026-03-08T01:30:39Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44278/files", - "html_url": "https://github.com/huggingface/transformers/pull/44278", + "files_url": "https://github.com/huggingface/transformers/pull/44520/files", + "html_url": "https://github.com/huggingface/transformers/pull/44520", "labels": [ "Code agent slop" ], "merged": false, - "number": 44278, + "number": 44520, "review_comments_count": 0, "state": "closed", - "title": "[FEAT] Pipelines - Faster group_entities", - "updated_at": "2026-02-25T13:54:58Z" + "title": "test(cli): add unit tests for env and model utility commands", + "updated_at": "2026-03-09T13:19:15Z" }, { - "additions": 171, - "author": "tarekziade", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? The GLM-ASR integration test in the documentation is a copy of the one in the test suite. This patch removes duplication by: - moving the tests in the docs using `runnables` - see https://github.com/huggingface/doc-\u2026", - "changed_files": 10, + "additions": 3, + "author": "Sai-Suraj-27", + "author_association": "CONTRIBUTOR", + "body_excerpt": "# What does this PR do? Fixes these failing [MarianIntegrationTests](https://github.com/huggingface/transformers/actions/runs/22606636929/job/65500458014#step:14:6186) \"image\" needs a test", - "changed_files": 36, + "additions": 16, + "author": "kushalkkb", + "author_association": "NONE", + "body_excerpt": "This PR improves error handling in the load_vocab function. Changes: - Added validation to ensure vocab_file is a string path - Added check for file existence - Raised clearer FileNotFoundError when vocabulary file is missing This improves\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/44264", - "created_at": "2026-02-24T18:06:58Z", - "deletions": 210, - "draft": true, - "files_url": "https://github.com/huggingface/transformers/pull/44264/files", - "html_url": "https://github.com/huggingface/transformers/pull/44264", - "labels": [], + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44505", + "created_at": "2026-03-06T17:47:37Z", + "deletions": 0, + "draft": false, + "files_url": "https://github.com/huggingface/transformers/pull/44505/files", + "html_url": "https://github.com/huggingface/transformers/pull/44505", + "labels": [ + "Code agent slop" + ], "merged": false, - "number": 44264, - "review_comments_count": 3, - "state": "open", - "title": "[`Moe`] Enable aux loss automatically when in training + coef is not 0", - "updated_at": "2026-02-25T18:53:20Z" + "number": 44505, + "review_comments_count": 0, + "state": "closed", + "title": "Improve error handling in load_vocab for invalid vocabulary path", + "updated_at": "2026-03-10T04:14:31Z" }, { - "additions": 5882, - "author": "SunMarc", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? This PR refactor the common tests that we have in Trainer. I've mainly did the following: - Split the tests that we have in `test_trainer.py` into multiple files. - Fix common tests that were failing in the CI", - "changed_files": 18, + "additions": 13, + "author": "kushalkkb", + "author_association": "NONE", + "body_excerpt": "This PR improves error handling in the load_vocab function. Changes: - Added validation to ensure vocab_file is a string path - Added check for file existence - Raised clearer FileNotFoundError when vocabulary file is missing This improves\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44260", - "created_at": "2026-02-24T15:51:11Z", - "deletions": 6147, + "conversation_url": "https://github.com/huggingface/transformers/pull/44504", + "created_at": "2026-03-06T17:24:10Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44260/files", - "html_url": "https://github.com/huggingface/transformers/pull/44260", + "files_url": "https://github.com/huggingface/transformers/pull/44504/files", + "html_url": "https://github.com/huggingface/transformers/pull/44504", "labels": [], - "merged": true, - "number": 44260, - "review_comments_count": 3, + "merged": false, + "number": 44504, + "review_comments_count": 0, "state": "closed", - "title": "Update common tests Trainer", - "updated_at": "2026-02-27T17:31:59Z" + "title": "Improve error handling in load_vocab for invalid vocabulary path", + "updated_at": "2026-03-06T17:46:17Z" }, { - "additions": 1830, - "author": "winglian", - "author_association": "COLLABORATOR", - "body_excerpt": "# What does this PR do? This PR supersedes #43985 to replace the dataset/sampler/dataloader with a data producer that should allow us to more easily get to the next step of async training for RL. \"\". Then we compare `\"\" != \"LlamaTokenizer\"` (the `tokenizer_class` in `tokenizer_config.json`). Since that's true we earl\u2026", - "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/44127", - "created_at": "2026-02-18T10:41:48Z", - "deletions": 8, + "comments_count": 4, + "conversation_url": "https://github.com/huggingface/transformers/pull/44436", + "created_at": "2026-03-04T15:26:48Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44127/files", - "html_url": "https://github.com/huggingface/transformers/pull/44127", + "files_url": "https://github.com/huggingface/transformers/pull/44436/files", + "html_url": "https://github.com/huggingface/transformers/pull/44436", "labels": [], "merged": true, - "number": 44127, - "review_comments_count": 0, + "number": 44436, + "review_comments_count": 4, "state": "closed", - "title": "AutoTokenizer ignores config when model_type is None", - "updated_at": "2026-02-18T14:47:52Z" + "title": "Fix continuous batching for multimodal models", + "updated_at": "2026-03-09T13:58:37Z" }, { - "additions": 17, - "author": "Cyrilvallez", + "additions": 138, + "author": "remi-or", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? As per the title. Let's simplify after https://github.com/huggingface/transformers/pull/42848", - "changed_files": 2, + "body_excerpt": "This PR adds the option to have a ContinuousBatchingManager not be destroyed after generation is over. This allows the user to re-use the manager without requiring him to know any other entry point for CB apart from `generate_batch` or the\u2026", + "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44126", - "created_at": "2026-02-18T09:58:49Z", - "deletions": 40, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44435", + "created_at": "2026-03-04T14:17:08Z", + "deletions": 54, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44126/files", - "html_url": "https://github.com/huggingface/transformers/pull/44126", + "files_url": "https://github.com/huggingface/transformers/pull/44435/files", + "html_url": "https://github.com/huggingface/transformers/pull/44435", "labels": [], "merged": true, - "number": 44126, - "review_comments_count": 0, + "number": 44435, + "review_comments_count": 2, "state": "closed", - "title": "Simplify input preparation in generate", - "updated_at": "2026-02-18T10:30:48Z" + "title": "[CB] Persistent manager", + "updated_at": "2026-03-26T22:02:28Z" }, { - "additions": 8, - "author": "zucchini-nlp", + "additions": 413, + "author": "remi-or", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/43986", - "changed_files": 1, + "body_excerpt": "This PR adds a dedicated config for continuous batching, which is starting to have a lot parameters. This will give the user a clear view of what is possible and make adding new parameters easier. No breaking changes through `account_for_c\u2026", + "changed_files": 9, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44125", - "created_at": "2026-02-18T09:34:54Z", - "deletions": 7, + "conversation_url": "https://github.com/huggingface/transformers/pull/44434", + "created_at": "2026-03-04T13:49:05Z", + "deletions": 303, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44125/files", - "html_url": "https://github.com/huggingface/transformers/pull/44125", + "files_url": "https://github.com/huggingface/transformers/pull/44434/files", + "html_url": "https://github.com/huggingface/transformers/pull/44434", "labels": [], "merged": true, - "number": 44125, - "review_comments_count": 2, + "number": 44434, + "review_comments_count": 12, "state": "closed", - "title": "Raise informative error when loading video processors", - "updated_at": "2026-02-20T08:23:35Z" + "title": "[CB] Add dedicated config", + "updated_at": "2026-03-13T13:56:40Z" }, { - "additions": 10, - "author": "mariam851", + "additions": 177, + "author": "leopold-tzafon", "author_association": "CONTRIBUTOR", - "body_excerpt": "Description: Adds eval_on_end to TrainingArguments to force evaluation at the end of training, even if the last step doesn't align with eval_steps. Changes: training_args.py: Added eval_on_end field. trainer.py: Added logic to call evaluat\u2026", - "changed_files": 2, + "body_excerpt": "# What does this PR do? Instead of silently failing when mm_token_type_ids is not passed, derives it in Qwen3 and Qwen3.5. Same as it was before: https://github.com/huggingface/transformers/commit/c281a2de8998e66e93fac30a236225528531df9b P\u2026", + "changed_files": 18, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44124", - "created_at": "2026-02-18T08:52:23Z", - "deletions": 0, + "comments_count": 9, + "conversation_url": "https://github.com/huggingface/transformers/pull/44433", + "created_at": "2026-03-04T13:46:14Z", + "deletions": 61, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44124/files", - "html_url": "https://github.com/huggingface/transformers/pull/44124", + "files_url": "https://github.com/huggingface/transformers/pull/44433/files", + "html_url": "https://github.com/huggingface/transformers/pull/44433", "labels": [], - "merged": false, - "number": 44124, + "merged": true, + "number": 44433, "review_comments_count": 0, "state": "closed", - "title": "feat: add eval_on_end to Trainer for final evaluation", - "updated_at": "2026-02-18T14:14:16Z" + "title": "fix: raise error if mm_token_type_ids not supplied ", + "updated_at": "2026-03-12T17:12:47Z" }, { - "additions": 15, - "author": "cyyever", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? This PR avoids device sync in training loss accumulation by ```torch.where```. The `is_torch_xla_available` condition is also removed.", - "changed_files": 1, + "additions": 85, + "author": "zucchini-nlp", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? As per title, instead of having to divide image token by norm scale, we can do it same way as in other model (eg. gemma3) and add a custom embed layer. It should be 100% BC because users usually call `self.embed_tok\u2026", + "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44123", - "created_at": "2026-02-18T08:22:57Z", - "deletions": 21, + "comments_count": 5, + "conversation_url": "https://github.com/huggingface/transformers/pull/44432", + "created_at": "2026-03-04T10:04:40Z", + "deletions": 38, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44123/files", - "html_url": "https://github.com/huggingface/transformers/pull/44123", + "files_url": "https://github.com/huggingface/transformers/pull/44432/files", + "html_url": "https://github.com/huggingface/transformers/pull/44432", "labels": [], - "merged": false, - "number": 44123, - "review_comments_count": 0, - "state": "open", - "title": "Avoid device sync in training loss accumulation", - "updated_at": "2026-02-20T04:43:19Z" - }, - { - "additions": 158, - "author": "adityuhkapoor", - "author_association": "NONE", - "body_excerpt": "# What does this PR do? Adds 4-bit embedding quantization for BitsAndBytes, mirroring TorchAO's existing `include_input_output_embeddings` and `untie_embedding_weights` pattern (PRs #37802, #37905, #37935). Large-vocabulary models (Llama 3\u2026", - "changed_files": 4, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44122", - "created_at": "2026-02-18T06:35:09Z", - "deletions": 2, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44122/files", - "html_url": "https://github.com/huggingface/transformers/pull/44122", - "labels": [ - "Code agent slop" - ], - "merged": false, - "number": 44122, - "review_comments_count": 0, - "state": "closed", - "title": "Add BnB 4-bit embedding quantization support", - "updated_at": "2026-02-18T14:27:25Z" - }, - { - "additions": 14, - "author": "tirth8205", - "author_association": "NONE", - "body_excerpt": "Fixes #34920 After applying `normalize()`, images can have negative values. Calling `resize()` on such images fails because it internally converts to PIL, which requires values in [0, 1] or [0, 255]. ### Fix When the image has values outsi\u2026", - "changed_files": 1, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44120", - "created_at": "2026-02-17T23:56:48Z", - "deletions": 0, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44120/files", - "html_url": "https://github.com/huggingface/transformers/pull/44120", - "labels": [ - "Code agent slop" - ], - "merged": false, - "number": 44120, + "merged": true, + "number": 44432, "review_comments_count": 0, "state": "closed", - "title": "fix: allow image_transforms.resize to handle negative values after normalization", - "updated_at": "2026-02-18T14:08:54Z" + "title": "Make paligemma embed tokens standard", + "updated_at": "2026-03-11T08:38:41Z" }, { - "additions": 1, - "author": "tirth8205", - "author_association": "NONE", - "body_excerpt": "Fixes #44117 `TOKENIZER_MAPPING_NAMES.get(config_model_type, \"\")` returns `None` when the key exists with value `None`, causing `AttributeError: 'NoneType' object has no attribute 'replace'` when loading models like `google/siglip2-so400m-\u2026", - "changed_files": 1, + "additions": 4103, + "author": "zucchini-nlp", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? Re-opening back a PR on cleaning up clip-like model's backbones. Let's merge it now, I've been seeing quite a lot of ppl reporting it and I am not sure when it will be resolved by the big vision refactor Basically,\u2026", + "changed_files": 42, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/44119", - "created_at": "2026-02-17T23:53:20Z", - "deletions": 1, + "comments_count": 21, + "conversation_url": "https://github.com/huggingface/transformers/pull/44431", + "created_at": "2026-03-04T10:02:13Z", + "deletions": 2230, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44119/files", - "html_url": "https://github.com/huggingface/transformers/pull/44119", + "files_url": "https://github.com/huggingface/transformers/pull/44431/files", + "html_url": "https://github.com/huggingface/transformers/pull/44431", "labels": [], "merged": false, - "number": 44119, - "review_comments_count": 0, - "state": "closed", - "title": "fix: handle None value from TOKENIZER_MAPPING_NAMES.get() in AutoTokenizer", - "updated_at": "2026-02-18T14:04:47Z" + "number": 44431, + "review_comments_count": 92, + "state": "open", + "title": "Refactor CLIP-like models", + "updated_at": "2026-04-02T16:15:36Z" }, { - "additions": 32, - "author": "tirth8205", + "additions": 0, + "author": "Rohang2005", "author_association": "NONE", - "body_excerpt": "## Fix Fixes #44079 When a `ModelOutput` dataclass field is initialized as `None`, it is correctly excluded from the OrderedDict keys. However, **subsequently setting that field to a non-None value** via attribute assignment (e.g. `outputs\u2026", - "changed_files": 2, + "body_excerpt": "## What does this PR do? This PR fixes an inconsistency in the AFMoE module where `past_key_values` was passed to a function argument expecting `past_key_value`. The function signature expects a singular cache object (`past_key_value`), bu\u2026", + "changed_files": 0, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/44118", - "created_at": "2026-02-17T23:31:31Z", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44430", + "created_at": "2026-03-04T08:13:38Z", "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44118/files", - "html_url": "https://github.com/huggingface/transformers/pull/44118", + "files_url": "https://github.com/huggingface/transformers/pull/44430/files", + "html_url": "https://github.com/huggingface/transformers/pull/44430", "labels": [ "Code agent slop" ], "merged": false, - "number": 44118, + "number": 44430, "review_comments_count": 0, "state": "closed", - "title": "fix: ModelOutput keys not updated when setting previously-None dataclass fields", - "updated_at": "2026-02-18T14:18:12Z" + "title": "Fix inconsistent past_key_value/past_key_values usage in AFMoE modeling", + "updated_at": "2026-03-04T14:07:32Z" }, { - "additions": 27, - "author": "dtiourine", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Migrate Flaubert to the @capture_outputs and @can_return_tuple decorator pattern for output handling, as part of #43979. # What does this PR do? - Add `_can_record_outputs = {\"attentions\": MultiHeadAttention}` on `FlaubertPreTrainedModel`\u2026", + "additions": 14, + "author": "thakoreh", + "author_association": "NONE", + "body_excerpt": "## Summary Fixes #44336 The `loading_report` module was using `PALETTE['italic']` and `PALETTE['bold']` directly in string formatting, which caused ANSI escape codes to be emitted even when stdout is not connected to a terminal (e.g., when\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44116", - "created_at": "2026-02-17T21:52:13Z", - "deletions": 102, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44429", + "created_at": "2026-03-04T07:47:02Z", + "deletions": 6, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44116/files", - "html_url": "https://github.com/huggingface/transformers/pull/44116", - "labels": [], + "files_url": "https://github.com/huggingface/transformers/pull/44429/files", + "html_url": "https://github.com/huggingface/transformers/pull/44429", + "labels": [ + "Code agent slop" + ], "merged": false, - "number": 44116, + "number": 44429, "review_comments_count": 0, - "state": "open", - "title": "[WIP] [Flaubert] Refactor output tracing to decorator-based interface", - "updated_at": "2026-02-17T21:53:23Z" + "state": "closed", + "title": "Fix ANSI codes emitted in loading_report when stdout is not a TTY", + "updated_at": "2026-03-04T13:58:46Z" }, { - "additions": 2, - "author": "Deep-unlearning", - "author_association": "MEMBER", - "body_excerpt": "## Summary - Fix broken `[chat template](./chat_templating)` links in `docs/source/en/tasks/` - `./chat_templating` resolves within `tasks/` (doesn't exist); corrected to `../chat_templating` - Affected files: `tasks/image_text_to_text.md`\u2026", + "additions": 10, + "author": "kaixuanliu", + "author_association": "CONTRIBUTOR", + "body_excerpt": "@IlyasMoutawwakil pls help review, thx!", "changed_files": 2, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44115", - "created_at": "2026-02-17T21:32:55Z", - "deletions": 2, + "cluster_id": "cluster-43324-21", + "cluster_ids": [ + "cluster-43324-21" + ], + "cluster_role": "member", + "comments_count": 3, + "conversation_url": "https://github.com/huggingface/transformers/pull/44428", + "created_at": "2026-03-04T07:41:20Z", + "deletions": 3, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44115/files", - "html_url": "https://github.com/huggingface/transformers/pull/44115", + "files_url": "https://github.com/huggingface/transformers/pull/44428/files", + "html_url": "https://github.com/huggingface/transformers/pull/44428", "labels": [], "merged": true, - "number": 44115, - "review_comments_count": 0, + "number": 44428, + "review_comments_count": 1, "state": "closed", - "title": "[docs] fix broken chat_templating links in tasks docs", - "updated_at": "2026-02-23T16:27:57Z" + "title": "Add XPU Expectations for vibe voice acoustic tokenizer tests", + "updated_at": "2026-04-02T03:21:38Z" }, { - "additions": 716, - "author": "23atharvaS", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## Summary This PR migrates the `wav2vec2` family to the standardized output-capturing interface (`@capture_outputs` + `@can_return_tuple`) and includes follow-up compatibility fixes required to make full CI green. ## What changed ### Core\u2026", - "changed_files": 19, + "additions": 43, + "author": "Jaredw2289-svg", + "author_association": "NONE", + "body_excerpt": "Fixes #44297 ## Problem `tokenizer.save_pretrained()` overwrites `tokenizer_class` in `tokenizer_config.json` with the current wrapper class (e.g. `PreTrainedTokenizerFast`) instead of preserving the original class from the loaded config (\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44114", - "created_at": "2026-02-17T21:17:35Z", - "deletions": 1237, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44427", + "created_at": "2026-03-04T06:03:56Z", + "deletions": 6, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44114/files", - "html_url": "https://github.com/huggingface/transformers/pull/44114", + "files_url": "https://github.com/huggingface/transformers/pull/44427/files", + "html_url": "https://github.com/huggingface/transformers/pull/44427", "labels": [], "merged": false, - "number": 44114, + "number": 44427, "review_comments_count": 0, - "state": "open", - "title": "Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators", - "updated_at": "2026-02-18T20:34:53Z" + "state": "closed", + "title": "fix(tokenization): preserve original tokenizer_class in save_pretrained", + "updated_at": "2026-03-11T02:59:12Z" }, { - "additions": 5, - "author": "harshaljanjani", + "additions": 29, + "author": "kaixuanliu", "author_association": "CONTRIBUTOR", - "body_excerpt": "### What does this PR do? The following issue was identified and fixed in this PR: \u2192 Updates the stale `test_device_override` in `test_processing_granite_speech.py` to verify that the device param controls where speech inputs are placed, r\u2026", + "body_excerpt": "@IlyasMoutawwakil Can you help review? Thx!", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44113", - "created_at": "2026-02-17T20:01:32Z", - "deletions": 7, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44426", + "created_at": "2026-03-04T05:57:34Z", + "deletions": 10, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44113/files", - "html_url": "https://github.com/huggingface/transformers/pull/44113", + "files_url": "https://github.com/huggingface/transformers/pull/44426/files", + "html_url": "https://github.com/huggingface/transformers/pull/44426", "labels": [], "merged": true, - "number": 44113, + "number": 44426, "review_comments_count": 2, "state": "closed", - "title": "fix(testing): Update stale device override test in GraniteSpeech", - "updated_at": "2026-02-19T11:24:29Z" - }, - { - "additions": 30, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `poolformer` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `PoolFormerLayer` to return a single tensor instead of a 1-tuple - Simplifies `\u2026", - "changed_files": 1, - "cluster_id": "cluster-44107-10", - "cluster_ids": [ - "cluster-44107-10" - ], - "cluster_role": "canonical", - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/44111", - "created_at": "2026-02-17T19:38:02Z", - "deletions": 59, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44111/files", - "html_url": "https://github.com/huggingface/transformers/pull/44111", - "labels": [], - "merged": false, - "number": 44111, - "review_comments_count": 0, - "state": "closed", - "title": "refactor(poolformer): use capture_outputs for output tracing", - "updated_at": "2026-02-18T21:19:22Z" + "title": "update the expected output for qwen2_5_vl w/ pytorch 2.10 XPU", + "updated_at": "2026-03-04T09:55:55Z" }, { - "additions": 28, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `tvp` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `TvpAttention` to always return `(output, attention_probs)` (hooks decide what to capt\u2026", + "additions": 1, + "author": "qgallouedec", + "author_association": "MEMBER", + "body_excerpt": "I believe the second `if` should be `elif` so the else branch only triggers when neither the string-truncation NOR the float-formatting conditions apply. Otherwise it overwrites the truncation message with the original long string.", "changed_files": 1, - "cluster_id": "cluster-44107-10", - "cluster_ids": [ - "cluster-44107-10" - ], - "cluster_role": "member", - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44110", - "created_at": "2026-02-17T19:32:55Z", - "deletions": 101, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44110/files", - "html_url": "https://github.com/huggingface/transformers/pull/44110", - "labels": [], - "merged": false, - "number": 44110, - "review_comments_count": 0, - "state": "closed", - "title": "refactor(tvp): use capture_outputs for output tracing", - "updated_at": "2026-02-18T21:19:24Z" - }, - { - "additions": 48, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `hgnet_v2` to use the `capture_outputs` and `merge_with_config_defaults` decorators - Simplifies `HGNetV2Encoder` by removing `return_dict` parameter (always returns `BaseModelOutputWithNoAttention`)\u2026", - "changed_files": 2, - "cluster_id": "cluster-44107-10", - "cluster_ids": [ - "cluster-44107-10" - ], - "cluster_role": "member", - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44109", - "created_at": "2026-02-17T19:23:03Z", - "deletions": 87, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44425", + "created_at": "2026-03-04T02:48:00Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44109/files", - "html_url": "https://github.com/huggingface/transformers/pull/44109", + "files_url": "https://github.com/huggingface/transformers/pull/44425/files", + "html_url": "https://github.com/huggingface/transformers/pull/44425", "labels": [], "merged": false, - "number": 44109, + "number": 44425, "review_comments_count": 0, - "state": "closed", - "title": "refactor(hgnet_v2): use capture_outputs for output tracing", - "updated_at": "2026-02-18T21:19:25Z" + "state": "open", + "title": "Fix conditional check for float formatting", + "updated_at": "2026-03-04T02:48:41Z" }, { - "additions": 33, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Adds `@merge_with_config_defaults` and `@capture_outputs` to both `VitDetModel` and `VitDetBackbone`, removing manual `output_attentions`/`return_dict` resolution - Adds `_can_record_outputs = {\"attentions\": VitDetAttention}`\u2026", + "additions": 6, + "author": "jw9603", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## What does this PR do? Fixes `AttributeError: 'str' object has no attribute 'to'` when using `transformers serve --continuous-batching` with multimodal models like Qwen3.5-9B. `processor.apply_chat_template()` returns a plain string (not\u2026", "changed_files": 1, - "cluster_id": "cluster-44107-10", - "cluster_ids": [ - "cluster-44107-10" - ], - "cluster_role": "member", - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44108", - "created_at": "2026-02-17T19:15:00Z", - "deletions": 82, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 5, + "conversation_url": "https://github.com/huggingface/transformers/pull/44424", + "created_at": "2026-03-04T00:56:08Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44108/files", - "html_url": "https://github.com/huggingface/transformers/pull/44108", + "files_url": "https://github.com/huggingface/transformers/pull/44424/files", + "html_url": "https://github.com/huggingface/transformers/pull/44424", "labels": [], "merged": false, - "number": 44108, + "number": 44424, "review_comments_count": 0, "state": "closed", - "title": "refactor(vitdet): use output tracing decorators", - "updated_at": "2026-02-18T21:19:27Z" + "title": "Fix `transformers serve --continuous-batching` for multimodal models", + "updated_at": "2026-03-05T09:16:25Z" }, { - "additions": 40, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Replaces manual `output_hidden_states`/`return_dict` resolution in `MraModel` with `@merge_with_config_defaults` and `@capture_outputs` decorators - Simplifies `MraEncoder` to a plain loop returning a single tensor, removing `\u2026", - "changed_files": 1, - "cluster_id": "cluster-44107-10", - "cluster_ids": [ - "cluster-44107-10" - ], - "cluster_role": "member", - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44107", - "created_at": "2026-02-17T19:04:42Z", - "deletions": 112, + "additions": 117, + "author": "mitre88", + "author_association": "CONTRIBUTOR", + "body_excerpt": "## What does this PR do? Adds a Spanish (es) translation of the `conversations.md` guide, which covers the fundamentals of using chat models in Transformers. ### Translated sections: - Chat CLI usage - TextGenerationPipeline in chat mode -\u2026", + "changed_files": 2, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44422", + "created_at": "2026-03-04T00:42:43Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44107/files", - "html_url": "https://github.com/huggingface/transformers/pull/44107", + "files_url": "https://github.com/huggingface/transformers/pull/44422/files", + "html_url": "https://github.com/huggingface/transformers/pull/44422", "labels": [], - "merged": false, - "number": 44107, + "merged": true, + "number": 44422, "review_comments_count": 0, "state": "closed", - "title": "refactor(mra): use output tracing decorators", - "updated_at": "2026-02-18T21:19:29Z" + "title": "docs: add Spanish translation for conversations.md (chat basics)", + "updated_at": "2026-03-04T16:45:24Z" }, { - "additions": 47, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `YosoEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 5 wrapper model classes, eliminating manual `return_dict` handlin\u2026", - "changed_files": 1, + "additions": 309, + "author": "michaelbenayoun", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? When we shard weights according to a TP plan, we do not update the corresponding parent module attributes. For instance if we shard the weight of a `torch.nn.Linear`, we should also update its `in_features` or `out_\u2026", + "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44106", - "created_at": "2026-02-17T18:59:25Z", - "deletions": 132, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44421", + "created_at": "2026-03-03T22:51:47Z", + "deletions": 5, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44106/files", - "html_url": "https://github.com/huggingface/transformers/pull/44106", + "files_url": "https://github.com/huggingface/transformers/pull/44421/files", + "html_url": "https://github.com/huggingface/transformers/pull/44421", "labels": [], - "merged": false, - "number": 44106, + "merged": true, + "number": 44421, "review_comments_count": 0, "state": "closed", - "title": "Refactor yoso to use automatic output tracing", - "updated_at": "2026-02-18T21:19:30Z" + "title": "Update parent module attributes when sharding with TP", + "updated_at": "2026-03-05T23:32:06Z" }, { - "additions": 39, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `LiltEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 3 wrapper model classes, eliminating manual `return_dict` handlin\u2026", - "changed_files": 1, + "additions": 249, + "author": "stevhliu", + "author_association": "MEMBER", + "body_excerpt": "- removes \"Number of accelerators\" section from \"Accelerator selection\" guide since this is probably pretty commonly known - add a new \"DDP\" guide - refactored \"Accelerate\" guide with a more focused overview of what it is and how to config\u2026", + "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44105", - "created_at": "2026-02-17T18:54:40Z", - "deletions": 127, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44420", + "created_at": "2026-03-03T22:41:59Z", + "deletions": 250, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44105/files", - "html_url": "https://github.com/huggingface/transformers/pull/44105", + "files_url": "https://github.com/huggingface/transformers/pull/44420/files", + "html_url": "https://github.com/huggingface/transformers/pull/44420", "labels": [], "merged": false, - "number": 44105, + "number": 44420, "review_comments_count": 0, - "state": "closed", - "title": "Refactor lilt to use automatic output tracing", - "updated_at": "2026-02-18T21:19:32Z" + "state": "open", + "title": "[docs] distributed training", + "updated_at": "2026-03-11T17:36:12Z" }, { - "additions": 66, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions`/`cross_attentions` collection in `MegatronBertEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 8 wrapper model classes, eliminating m\u2026", + "additions": 6, + "author": "michaelbenayoun", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? To be merged after #44302 and https://github.com/huggingface/kernels/pull/285. It adds the `neuron` device in checks for custom kernels, enabling to load kernels for Neuron devices.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44104", - "created_at": "2026-02-17T18:43:44Z", - "deletions": 207, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44417", + "created_at": "2026-03-03T20:15:26Z", + "deletions": 6, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44104/files", - "html_url": "https://github.com/huggingface/transformers/pull/44104", + "files_url": "https://github.com/huggingface/transformers/pull/44417/files", + "html_url": "https://github.com/huggingface/transformers/pull/44417", "labels": [], - "merged": false, - "number": 44104, + "merged": true, + "number": 44417, "review_comments_count": 0, "state": "closed", - "title": "Refactor megatron_bert to use automatic output tracing", - "updated_at": "2026-02-18T21:19:34Z" + "title": "Neuron kernels integration", + "updated_at": "2026-03-05T17:09:39Z" }, { - "additions": 53, - "author": "engmohamedsalah", - "author_association": "NONE", - "body_excerpt": "Fixes #44052 Now and then, the indexer ran into trouble switching between masks and cache. Most of the test failures came from these hiccups: - Indexer cache: the old if seq_len > 1: reset cache heuristic broke assisted decoding (multi-tok\u2026", - "changed_files": 3, + "additions": 1, + "author": "tyler-romero", + "author_association": "CONTRIBUTOR", + "body_excerpt": "Register `olmo_hybrid` in `TOKENIZER_MAPPING_NAMES` so auto-tokenizer resolution works, matching the other auto-registrations already in place for this model.", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44103", - "created_at": "2026-02-17T18:04:48Z", - "deletions": 76, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44416", + "created_at": "2026-03-03T19:30:56Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44103/files", - "html_url": "https://github.com/huggingface/transformers/pull/44103", + "files_url": "https://github.com/huggingface/transformers/pull/44416/files", + "html_url": "https://github.com/huggingface/transformers/pull/44416", "labels": [], - "merged": false, - "number": 44103, - "review_comments_count": 0, + "merged": true, + "number": 44416, + "review_comments_count": 2, "state": "closed", - "title": "Fix glm_moe_dsa", - "updated_at": "2026-02-18T19:38:11Z" + "title": "[tiny] Add olmo_hybrid to tokenizer auto-mapping", + "updated_at": "2026-03-04T19:26:10Z" }, { - "additions": 42, - "author": "fumadari", - "author_association": "NONE", - "body_excerpt": "## Summary Refactors the `ibert` model to use the new `@capture_outputs` and `@can_return_tuple` decorators for output tracing, as part of the meta-issue #43979. **Key changes:** - Added `_can_record_outputs = {\"hidden_states\": IBertLayer,\u2026", + "additions": 2, + "author": "SunMarc", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? This PR removes @MekkCyber from the PR template. cc @Rocketknight1 you only need to ping me now ;)", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44102", - "created_at": "2026-02-17T17:21:32Z", - "deletions": 154, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44415", + "created_at": "2026-03-03T16:59:08Z", + "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44102/files", - "html_url": "https://github.com/huggingface/transformers/pull/44102", + "files_url": "https://github.com/huggingface/transformers/pull/44415/files", + "html_url": "https://github.com/huggingface/transformers/pull/44415", "labels": [], - "merged": false, - "number": 44102, + "merged": true, + "number": 44415, "review_comments_count": 0, "state": "closed", - "title": "Refactor ibert output tracing with capture_outputs", - "updated_at": "2026-02-18T21:19:35Z" + "title": "Update PR template", + "updated_at": "2026-03-04T14:13:04Z" }, { - "additions": 210, - "author": "aman-coder03", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## What does this PR do? This PR refactors XLM's output tracing to align with the standardized output capturing patterns used across the codebase. ### Key changes: - Refactors transformer blocks into a dedicated `XLMLayer` module to enable\u2026", - "changed_files": 2, + "additions": 35, + "author": "Cyrilvallez", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/44303 - see also comments here https://github.com/huggingface/transformers/pull/44316#issuecomment-3984362089. Supersedes https://github.com/huggingface/trans\u2026", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44414", + "created_at": "2026-03-03T16:47:47Z", + "deletions": 39, + "draft": false, + "files_url": "https://github.com/huggingface/transformers/pull/44414/files", + "html_url": "https://github.com/huggingface/transformers/pull/44414", + "labels": [], + "merged": true, + "number": 44414, + "review_comments_count": 0, + "state": "closed", + "title": "Reduce tqdm verbosity during model loading", + "updated_at": "2026-03-03T16:57:56Z" + }, + { + "additions": 4, + "author": "Cyrilvallez", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? As per the title.", + "changed_files": 1, + "cluster_id": "cluster-44053-8", + "cluster_ids": [ + "cluster-44053-8" + ], + "cluster_role": "member", "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/44101", - "created_at": "2026-02-17T17:15:06Z", - "deletions": 194, + "conversation_url": "https://github.com/huggingface/transformers/pull/44413", + "created_at": "2026-03-03T16:24:43Z", + "deletions": 4, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44101/files", - "html_url": "https://github.com/huggingface/transformers/pull/44101", + "files_url": "https://github.com/huggingface/transformers/pull/44413/files", + "html_url": "https://github.com/huggingface/transformers/pull/44413", "labels": [], - "merged": false, - "number": 44101, + "merged": true, + "number": 44413, "review_comments_count": 0, - "state": "open", - "title": "[XLM] Refactor output tracing to align with capture_outputs standardized architecture", - "updated_at": "2026-02-19T08:08:33Z" + "state": "closed", + "title": "Fix peft conversion mappings", + "updated_at": "2026-03-03T17:08:39Z" }, { - "additions": 3, - "author": "qgallouedec", + "additions": 138, + "author": "tarekziade", "author_association": "MEMBER", - "body_excerpt": "In https://github.com/huggingface/trl/pull/5112 a user reported that `trl sft --help` fails It's because three inherited args from `TrainingArguments` (`torch_empty_cache_steps`, `gradient_checkpointing` and `use_liger_kernel`)help strings\u2026", - "changed_files": 1, + "body_excerpt": "# What does this PR do? Extends type checking to `src/transformers/quantizers`", + "changed_files": 28, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/44100", - "created_at": "2026-02-17T17:10:36Z", - "deletions": 3, + "comments_count": 25, + "conversation_url": "https://github.com/huggingface/transformers/pull/44412", + "created_at": "2026-03-03T14:53:31Z", + "deletions": 74, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/44100/files", - "html_url": "https://github.com/huggingface/transformers/pull/44100", + "files_url": "https://github.com/huggingface/transformers/pull/44412/files", + "html_url": "https://github.com/huggingface/transformers/pull/44412", "labels": [], "merged": true, - "number": 44100, - "review_comments_count": 0, + "number": 44412, + "review_comments_count": 33, "state": "closed", - "title": "Fix percentage formatting in help messages for gradient checkpointing, Liger Kernel, and empty cache steps", - "updated_at": "2026-02-20T09:57:51Z" + "title": "chore(typing): Add type checking to `src/transformers/quantizers`", + "updated_at": "2026-03-11T11:24:11Z" }, { - "additions": 2, - "author": "qgallouedec", + "additions": 59, + "author": "burtenshaw", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? modular doesn't properly convert some files (e.g. kyutai) Also fixes red CI on main", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43993", - "created_at": "2026-02-14T10:11:40Z", - "deletions": 12, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44283", + "created_at": "2026-02-25T18:33:17Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43993/files", - "html_url": "https://github.com/huggingface/transformers/pull/43993", + "files_url": "https://github.com/huggingface/transformers/pull/44283/files", + "html_url": "https://github.com/huggingface/transformers/pull/44283", "labels": [], "merged": true, - "number": 43993, + "number": 44283, "review_comments_count": 0, "state": "closed", - "title": "docs: fix typos across documentation files", - "updated_at": "2026-02-16T13:41:41Z" + "title": "[`Modular`] Fix file type regression", + "updated_at": "2026-02-25T20:04:41Z" }, { - "additions": 3, - "author": "taovinci0", - "author_association": "NONE", - "body_excerpt": "Replaces mutable default dict `weights={}` with `weights=None` and initializes inside the function. The dict is mutated via `weights[full_key] = w`, which can cause unexpected behavior across multiple calls.", + "additions": 5, + "author": "Rocketknight1", + "author_association": "MEMBER", + "body_excerpt": "Response schema save-loading was broken in #40936, this PR restores it! I did most of this in #42300 but missed an issue with loading/saving.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43991", - "created_at": "2026-02-14T00:00:00Z", - "deletions": 1, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44282", + "created_at": "2026-02-25T17:57:54Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43991/files", - "html_url": "https://github.com/huggingface/transformers/pull/43991", + "files_url": "https://github.com/huggingface/transformers/pull/44282/files", + "html_url": "https://github.com/huggingface/transformers/pull/44282", "labels": [], - "merged": false, - "number": 43991, + "merged": true, + "number": 44282, "review_comments_count": 0, "state": "closed", - "title": "fix: replace mutable default argument in _read_h5_weights", - "updated_at": "2026-02-16T11:18:06Z" + "title": "Restore response_schema saving-loading", + "updated_at": "2026-02-25T18:27:22Z" }, { - "additions": 10, - "author": "Abhijeetsingh610", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "## What does this PR do? Fixes a crash in `AutoVideoProcessor` when `torchvision` is unavailable. `VIDEO_PROCESSOR_MAPPING_NAMES` can contain `None`, and `video_processor_class_from_name` was doing `if class_name in extractors`, which rais\u2026", - "changed_files": 2, + "additions": 1, + "author": "ArthurZucker", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? Its a very small fix for #44062", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/43989", - "created_at": "2026-02-13T20:48:03Z", + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44281", + "created_at": "2026-02-25T16:28:37Z", "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43989/files", - "html_url": "https://github.com/huggingface/transformers/pull/43989", + "files_url": "https://github.com/huggingface/transformers/pull/44281/files", + "html_url": "https://github.com/huggingface/transformers/pull/44281", "labels": [], - "merged": false, - "number": 43989, + "merged": true, + "number": 44281, "review_comments_count": 0, - "state": "open", - "title": "Fix AutoVideoProcessor class lookup when torchvision is unavailable", - "updated_at": "2026-02-18T17:52:34Z" + "state": "closed", + "title": "Fix special token maps BC", + "updated_at": "2026-02-26T10:34:17Z" }, { - "additions": 7, - "author": "harshaljanjani", - "author_association": "CONTRIBUTOR", - "body_excerpt": "### What does this PR do? The following failing tests were identified and fixed in this PR: \u2192 **LayoutXLM:** [This PR (rm slow tokenizers)](https://github.com/huggingface/transformers/pull/40936) changed [models/auto/tokenization_auto.py](\u2026", - "changed_files": 2, + "additions": 614, + "author": "RishabhMehra", + "author_association": "FIRST_TIMER", + "body_excerpt": "# What does this PR do? - Adds an opt-in use_fast_grouping flag to TokenClassificationPipeline to enable a NumPy-vectorised BIO grouping path (~5\u00d7 faster on long sequences) while keeping the legacy path as default. - Improves correctness:\u2026", + "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 7, - "conversation_url": "https://github.com/huggingface/transformers/pull/43988", - "created_at": "2026-02-13T20:03:28Z", - "deletions": 9, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44278", + "created_at": "2026-02-25T12:49:56Z", + "deletions": 63, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43988/files", - "html_url": "https://github.com/huggingface/transformers/pull/43988", - "labels": [], - "merged": true, - "number": 43988, + "files_url": "https://github.com/huggingface/transformers/pull/44278/files", + "html_url": "https://github.com/huggingface/transformers/pull/44278", + "labels": [ + "Code agent slop" + ], + "merged": false, + "number": 44278, "review_comments_count": 0, "state": "closed", - "title": "fix(testing): Fix LayoutXLM tokenization test and LightOnOCR SDPA flash test failures on main CI", - "updated_at": "2026-02-23T14:07:59Z" + "title": "[FEAT] Pipelines - Faster group_entities", + "updated_at": "2026-02-25T13:54:58Z" }, { - "additions": 47, - "author": "winglian", - "author_association": "COLLABORATOR", - "body_excerpt": "# What does this PR do? Accelerator has a lot of other args that can be passed to it like fp8 support, etc, but requires extensive monkey patching downstream to make it work. This makes it easier to extend the accelerator args building met\u2026", - "changed_files": 1, + "additions": 105, + "author": "tarekziade", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? This patch makes the GLM-ASR doc example runnable by using `runnables` - see https://github.com/huggingface/doc-builder/blob/main/docs/runnable-code-blocks.md", + "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43987", - "created_at": "2026-02-13T18:51:56Z", - "deletions": 38, + "comments_count": 36, + "conversation_url": "https://github.com/huggingface/transformers/pull/44277", + "created_at": "2026-02-25T08:49:20Z", + "deletions": 19, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43987/files", - "html_url": "https://github.com/huggingface/transformers/pull/43987", + "files_url": "https://github.com/huggingface/transformers/pull/44277/files", + "html_url": "https://github.com/huggingface/transformers/pull/44277", "labels": [], "merged": true, - "number": 43987, - "review_comments_count": 2, + "number": 44277, + "review_comments_count": 6, "state": "closed", - "title": "split out accelerator args builder method", - "updated_at": "2026-02-16T14:59:03Z" + "title": "Use doc-builder runnable example for GLM-ASR", + "updated_at": "2026-04-02T16:16:55Z" }, { - "additions": 1828, - "author": "winglian", - "author_association": "COLLABORATOR", - "body_excerpt": "# What does this PR do? The `_inner_training_loop` method has a lot going on which makes it hard to extend for downstream developers/libraries. This PR breaks it up into smaller well described methods that are chained in the training loop.\u2026", - "changed_files": 5, + "additions": 0, + "author": "vishalpatil-45", + "author_association": "NONE", + "body_excerpt": "# What does this PR do? This PR addresses the performance regression where `import transformers` takes ~3.5s. The issue was caused by eager imports of heavy backend libraries (like torch/numpy) during the initial module load. By moving the\u2026", + "changed_files": 0, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/43985", - "created_at": "2026-02-13T17:55:01Z", - "deletions": 251, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44275", + "created_at": "2026-02-25T08:27:32Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43985/files", - "html_url": "https://github.com/huggingface/transformers/pull/43985", - "labels": [], + "files_url": "https://github.com/huggingface/transformers/pull/44275/files", + "html_url": "https://github.com/huggingface/transformers/pull/44275", + "labels": [ + "Code agent slop" + ], "merged": false, - "number": 43985, + "number": 44275, "review_comments_count": 0, "state": "closed", - "title": "Refactor inner training loop", - "updated_at": "2026-03-09T19:57:50Z" + "title": "[Fix] Restore lazy loading to improve import performance (#44273)", + "updated_at": "2026-02-25T20:37:18Z" }, { - "additions": 2, - "author": "materight", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? Removes unused `.squeeze` from VJEPA2 embeddings rotation. Currently the squeeze does nothing on video input since torch skips it if the dimension is not 1. Exporting to onnx and compiling to TensorRT instead fails\u2026", - "changed_files": 1, + "additions": 559, + "author": "paipeline", + "author_association": "NONE", + "body_excerpt": "## Description Fixes #44242 This PR resolves an issue where the auxiliary load balancing loss was not computed when `output_router_logits=False`, even when `router_aux_loss_coef != 0`. ## Problem The auxiliary loss computation was incorrec\u2026", + "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/43984", - "created_at": "2026-02-13T17:53:16Z", - "deletions": 2, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44274", + "created_at": "2026-02-25T06:38:02Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43984/files", - "html_url": "https://github.com/huggingface/transformers/pull/43984", - "labels": [], - "merged": true, - "number": 43984, - "review_comments_count": 0, - "state": "closed", - "title": "Remove unused squeeze from VJEPA2 embeddings rotation", - "updated_at": "2026-02-13T21:56:01Z" - }, - { - "additions": 62, - "author": "Aki-07", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? needs a test", + "changed_files": 36, + "cluster_id": null, + "cluster_ids": [], + "cluster_role": null, + "comments_count": 4, + "conversation_url": "https://github.com/huggingface/transformers/pull/44264", + "created_at": "2026-02-24T18:06:58Z", + "deletions": 210, "draft": true, - "files_url": "https://github.com/huggingface/transformers/pull/43973/files", - "html_url": "https://github.com/huggingface/transformers/pull/43973", + "files_url": "https://github.com/huggingface/transformers/pull/44264/files", + "html_url": "https://github.com/huggingface/transformers/pull/44264", "labels": [], "merged": false, - "number": 43973, - "review_comments_count": 0, + "number": 44264, + "review_comments_count": 3, "state": "open", - "title": "Add lfm2.5 audio", - "updated_at": "2026-02-21T16:42:21Z" + "title": "[`Moe`] Enable aux loss automatically when in training + coef is not 0", + "updated_at": "2026-02-25T18:53:20Z" }, { - "additions": 2219, - "author": "zucchini-nlp", + "additions": 5882, + "author": "SunMarc", "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Following Ernie, we build 3d positions based on `mm_token_type_ids` and the models will return them by default from `processor`. We have a unified `get_vision_position` in the qwen2-vl model file, all other models j\u2026", - "changed_files": 45, + "body_excerpt": "# What does this PR do? This PR refactor the common tests that we have in Trainer. I've mainly did the following: - Split the tests that we have in `test_trainer.py` into multiple files. - Fix common tests that were failing in the CI", + "changed_files": 18, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 30, - "conversation_url": "https://github.com/huggingface/transformers/pull/43972", - "created_at": "2026-02-13T09:31:44Z", - "deletions": 1611, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44260", + "created_at": "2026-02-24T15:51:11Z", + "deletions": 6147, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43972/files", - "html_url": "https://github.com/huggingface/transformers/pull/43972", + "files_url": "https://github.com/huggingface/transformers/pull/44260/files", + "html_url": "https://github.com/huggingface/transformers/pull/44260", "labels": [], "merged": true, - "number": 43972, - "review_comments_count": 17, + "number": 44260, + "review_comments_count": 3, "state": "closed", - "title": ":rotating_light: Unify 3D position ids", - "updated_at": "2026-03-05T18:48:30Z" + "title": "Update common tests Trainer", + "updated_at": "2026-02-27T17:31:59Z" }, { - "additions": 65, - "author": "caffeinism", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? 1. According to the paper, this model is designed to reference 250 contexts (10 seconds), but the current implementation uses DynamicCache without employing create_sliding_window_causal_mask, causing it to reference\u2026", + "additions": 1830, + "author": "winglian", + "author_association": "COLLABORATOR", + "body_excerpt": "# What does this PR do? This PR supersedes #43985 to replace the dataset/sampler/dataloader with a data producer that should allow us to more easily get to the next step of async training for RL. \"\". Then we compare `\"\" != \"LlamaTokenizer\"` (the `tokenizer_class` in `tokenizer_config.json`). Since that's true we earl\u2026", + "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 6, - "conversation_url": "https://github.com/huggingface/transformers/pull/43839", - "created_at": "2026-02-08T12:21:19Z", - "deletions": 6, + "comments_count": 5, + "conversation_url": "https://github.com/huggingface/transformers/pull/44127", + "created_at": "2026-02-18T10:41:48Z", + "deletions": 8, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43839/files", - "html_url": "https://github.com/huggingface/transformers/pull/43839", + "files_url": "https://github.com/huggingface/transformers/pull/44127/files", + "html_url": "https://github.com/huggingface/transformers/pull/44127", "labels": [], "merged": true, - "number": 43839, + "number": 44127, "review_comments_count": 0, "state": "closed", - "title": "fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast", - "updated_at": "2026-02-11T14:58:48Z" - }, - { - "additions": 2908, - "author": "mbtariq82", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "# What does this PR do? This PR adds Qwen3-ASR to the Transformers library. Fixes #43837 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [co\u2026", - "changed_files": 15, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 17, - "conversation_url": "https://github.com/huggingface/transformers/pull/43838", - "created_at": "2026-02-08T12:05:43Z", - "deletions": 0, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43838/files", - "html_url": "https://github.com/huggingface/transformers/pull/43838", - "labels": [ - "New model", - "Audio" - ], - "merged": false, - "number": 43838, - "review_comments_count": 27, - "state": "open", - "title": "Proposal to add Qwen3-ASR support [WIP]", - "updated_at": "2026-03-20T17:14:42Z" + "title": "AutoTokenizer ignores config when model_type is None", + "updated_at": "2026-02-18T14:47:52Z" }, { - "additions": 79, - "author": "pragnyanramtha", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Fixes #43824 what i think happened in #43824 is that waltwalt36 did not install the optional dependencies like pydantic, causing this issue. According to the core architecture docs, transformers implements a lazy loading mechanism for impo\u2026", - "changed_files": 1, + "additions": 17, + "author": "Cyrilvallez", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? As per the title. Let's simplify after https://github.com/huggingface/transformers/pull/42848", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43836", - "created_at": "2026-02-08T11:28:31Z", - "deletions": 70, + "conversation_url": "https://github.com/huggingface/transformers/pull/44126", + "created_at": "2026-02-18T09:58:49Z", + "deletions": 40, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43836/files", - "html_url": "https://github.com/huggingface/transformers/pull/43836", + "files_url": "https://github.com/huggingface/transformers/pull/44126/files", + "html_url": "https://github.com/huggingface/transformers/pull/44126", "labels": [], - "merged": false, - "number": 43836, - "review_comments_count": 2, - "state": "open", - "title": "fix: wrapped TypeAdpater in string literals (for now)", - "updated_at": "2026-02-17T04:46:27Z" + "merged": true, + "number": 44126, + "review_comments_count": 0, + "state": "closed", + "title": "Simplify input preparation in generate", + "updated_at": "2026-02-18T10:30:48Z" }, { - "additions": 5, - "author": "nulone", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Fixes #43828 ## What does this PR do? `torch._grouped_mm` is not registered for autocast. Under `torch.autocast`, LayerNorm outputs float32 while model weights stay bfloat16, causing RuntimeError: \"expected mat1 and mat2 to have same dtype\u2026", + "additions": 8, + "author": "zucchini-nlp", + "author_association": "MEMBER", + "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/43986", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/43833", - "created_at": "2026-02-08T07:26:06Z", - "deletions": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44125", + "created_at": "2026-02-18T09:34:54Z", + "deletions": 7, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43833/files", - "html_url": "https://github.com/huggingface/transformers/pull/43833", + "files_url": "https://github.com/huggingface/transformers/pull/44125/files", + "html_url": "https://github.com/huggingface/transformers/pull/44125", "labels": [], - "merged": false, - "number": 43833, - "review_comments_count": 0, - "state": "open", - "title": "fix: ensure dtype consistency in grouped_mm under autocast", - "updated_at": "2026-02-11T02:28:43Z" + "merged": true, + "number": 44125, + "review_comments_count": 2, + "state": "closed", + "title": "Raise informative error when loading video processors", + "updated_at": "2026-02-20T08:23:35Z" }, { - "additions": 0, - "author": "nulone", - "author_association": "FIRST_TIME_CONTRIBUTOR", - "body_excerpt": "Fixes #43827 ## What does this PR do? Removes deprecated `pipeline()` examples from summarization.md and translation.md that reference pre-v5 API. The manual `model.generate()` approach is preserved. ## Before submitting - [x] This PR fixe\u2026", + "additions": 10, + "author": "mariam851", + "author_association": "CONTRIBUTOR", + "body_excerpt": "Description: Adds eval_on_end to TrainingArguments to force evaluation at the end of training, even if the last step doesn't align with eval_steps. Changes: training_args.py: Added eval_on_end field. trainer.py: Added logic to call evaluat\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43832", - "created_at": "2026-02-08T07:06:47Z", - "deletions": 27, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44124", + "created_at": "2026-02-18T08:52:23Z", + "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43832/files", - "html_url": "https://github.com/huggingface/transformers/pull/43832", + "files_url": "https://github.com/huggingface/transformers/pull/44124/files", + "html_url": "https://github.com/huggingface/transformers/pull/44124", "labels": [], "merged": false, - "number": 43832, + "number": 44124, "review_comments_count": 0, "state": "closed", - "title": "docs: remove deprecated pipeline examples from summarization and tran\u2026", - "updated_at": "2026-02-08T07:19:52Z" + "title": "feat: add eval_on_end to Trainer for final evaluation", + "updated_at": "2026-02-18T14:14:16Z" }, { - "additions": 0, - "author": "Mr-Neutr0n", + "additions": 33, + "author": "cyyever", "author_association": "CONTRIBUTOR", - "body_excerpt": "## Summary - Removes `pipeline()`-based inference examples from summarization and translation task documentation - These examples no longer work in v5 since `SummarizationPipeline` and `TranslationPipeline` were removed ## Background Accor\u2026", - "changed_files": 2, + "body_excerpt": "# What does this PR do? This PR avoids device sync in training loss accumulation by ```torch.where```. The `is_torch_xla_available` condition is also removed.", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43831", - "created_at": "2026-02-08T06:39:23Z", - "deletions": 27, + "conversation_url": "https://github.com/huggingface/transformers/pull/44123", + "created_at": "2026-02-18T08:22:57Z", + "deletions": 22, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43831/files", - "html_url": "https://github.com/huggingface/transformers/pull/43831", + "files_url": "https://github.com/huggingface/transformers/pull/44123/files", + "html_url": "https://github.com/huggingface/transformers/pull/44123", "labels": [], - "merged": true, - "number": 43831, + "merged": false, + "number": 44123, "review_comments_count": 0, - "state": "closed", - "title": "[docs] Remove pipeline() examples from summarization/translation tasks", - "updated_at": "2026-02-09T12:33:04Z" + "state": "open", + "title": "Avoid device sync in training loss accumulation", + "updated_at": "2026-03-30T07:57:16Z" }, { - "additions": 7792, - "author": "bozheng-hit", - "author_association": "CONTRIBUTOR", - "body_excerpt": "This PR adds the support of codes for the upcoming Qwen3.5 series models. For information about Qwen, please visit: \ud83d\udc49https://qwen.ai Special thanks to @JJJYmmm for helping complete the code in this PR. We also appreciate the valuable feedb\u2026", - "changed_files": 28, + "additions": 158, + "author": "adityuhkapoor", + "author_association": "NONE", + "body_excerpt": "# What does this PR do? Adds 4-bit embedding quantization for BitsAndBytes, mirroring TorchAO's existing `include_input_output_embeddings` and `untie_embedding_weights` pattern (PRs #37802, #37905, #37935). Large-vocabulary models (Llama 3\u2026", + "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 6, - "conversation_url": "https://github.com/huggingface/transformers/pull/43830", - "created_at": "2026-02-08T05:51:57Z", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44122", + "created_at": "2026-02-18T06:35:09Z", "deletions": 2, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43830/files", - "html_url": "https://github.com/huggingface/transformers/pull/43830", + "files_url": "https://github.com/huggingface/transformers/pull/44122/files", + "html_url": "https://github.com/huggingface/transformers/pull/44122", "labels": [ - "New model" + "Code agent slop" ], - "merged": true, - "number": 43830, + "merged": false, + "number": 44122, "review_comments_count": 0, "state": "closed", - "title": "Adding Support for Qwen3.5", - "updated_at": "2026-03-03T02:26:31Z" + "title": "Add BnB 4-bit embedding quantization support", + "updated_at": "2026-02-18T14:27:25Z" }, { - "additions": 30, - "author": "jayzuccarelli", + "additions": 14, + "author": "tirth8205", "author_association": "NONE", - "body_excerpt": "Fixes #43805 Follow-up to #43794: add a pytest fixture that sets a fixed seed (42) before each test so we always get the same RNG state in model tests and improve determinism. - **`tests/conftest.py`** (new): `set_seed` fixture with `autou\u2026", + "body_excerpt": "Fixes #34920 After applying `normalize()`, images can have negative values. Calling `resize()` on such images fails because it internally converts to PIL, which requires values in [0, 1] or [0, 255]. ### Fix When the image has values outsi\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, - "conversation_url": "https://github.com/huggingface/transformers/pull/43829", - "created_at": "2026-02-08T05:10:32Z", + "conversation_url": "https://github.com/huggingface/transformers/pull/44120", + "created_at": "2026-02-17T23:56:48Z", "deletions": 0, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43829/files", - "html_url": "https://github.com/huggingface/transformers/pull/43829", + "files_url": "https://github.com/huggingface/transformers/pull/44120/files", + "html_url": "https://github.com/huggingface/transformers/pull/44120", "labels": [ "Code agent slop" ], "merged": false, - "number": 43829, + "number": 44120, "review_comments_count": 0, "state": "closed", - "title": "chore(tests): add set_seed pytest fixture for determinism", - "updated_at": "2026-02-10T01:55:12Z" + "title": "fix: allow image_transforms.resize to handle negative values after normalization", + "updated_at": "2026-02-18T14:08:54Z" }, { - "additions": 2, - "author": "math-hiyoko", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? ## Related Issue Fixes #40170 **Issue:** Add MXFP4 MoE/attention backward kernels **URL:** https://github.com/huggingface/transformers/issues/40170 ## Problem ## A Call To Action! The Hugg\u2026", - "changed_files": 6, + "additions": 63, + "author": "23atharvaS", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## What does this PR do? This PR introduces a new argument `eval_on_end` to the `Trainer` class. When enabled, the Trainer automatically runs evaluation at the end of training. This allows users to obtain final evaluation metrics without e\u2026", + "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 7, - "conversation_url": "https://github.com/huggingface/transformers/pull/43771", - "created_at": "2026-02-05T15:12:21Z", - "deletions": 4, + "comments_count": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44067", + "created_at": "2026-02-17T05:25:26Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43771/files", - "html_url": "https://github.com/huggingface/transformers/pull/43771", + "files_url": "https://github.com/huggingface/transformers/pull/44067/files", + "html_url": "https://github.com/huggingface/transformers/pull/44067", "labels": [ "Code agent slop" ], "merged": false, - "number": 43771, + "number": 44067, "review_comments_count": 0, "state": "closed", - "title": "fix: Add MXFP4 MoE/attention backward kernels", - "updated_at": "2026-03-24T14:14:44Z" + "title": "Add `eval_on_end` argument to Trainer for final evaluation after training", + "updated_at": "2026-02-17T13:32:34Z" }, { - "additions": 47, - "author": "lordaarush", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? Removes the unconditional `self.state.train_batch_size = self._train_batch_size` assignment that was causing issues when resuming from checkpoint with different batch configurations. The `train_batch_size` should on\u2026", + "additions": 35, + "author": "Jay-IIT", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "Migrate GPT-J from manual boilerplate output collection to the new decorator-based output tracing system: - Add `_can_record_outputs` to `GPTJPreTrainedModel` - Add `@capture_outputs` and `@merge_with_config_defaults` to `GPTJModel.forward\u2026", "changed_files": 2, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 7, - "conversation_url": "https://github.com/huggingface/transformers/pull/43770", - "created_at": "2026-02-05T14:25:36Z", - "deletions": 1, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43770/files", - "html_url": "https://github.com/huggingface/transformers/pull/43770", - "labels": [], - "merged": true, - "number": 43770, - "review_comments_count": 0, - "state": "closed", - "title": "Remove unconditional train_batch_size assignment", - "updated_at": "2026-02-06T14:47:16Z" - }, - { - "additions": 3950, - "author": "eustlb", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Adds voxtral realtime! ## benchmarks Using [this reproducer](https://gist.github.com/eustlb/367f062f77a5971291fb5350763bea8d), I've ran WER evals on ami, librispeech and fleurs, with results Dataset | Original (vllm\u2026", - "changed_files": 21, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/43769", - "created_at": "2026-02-05T14:17:52Z", - "deletions": 2, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43769/files", - "html_url": "https://github.com/huggingface/transformers/pull/43769", - "labels": [ - "New model", - "Audio" - ], - "merged": true, - "number": 43769, - "review_comments_count": 39, - "state": "closed", - "title": "Add Voxtral Realtime", - "updated_at": "2026-02-26T10:18:32Z" - }, - { - "additions": 87, - "author": "zucchini-nlp", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Helps vLLM to bump to v5", - "changed_files": 6, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/43768", - "created_at": "2026-02-05T14:04:02Z", - "deletions": 5, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43768/files", - "html_url": "https://github.com/huggingface/transformers/pull/43768", - "labels": [], - "merged": true, - "number": 43768, - "review_comments_count": 10, - "state": "closed", - "title": "Fix init weights in remote code", - "updated_at": "2026-02-17T14:45:18Z" - }, - { - "additions": 850, - "author": "XingweiDeng", - "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? src/transformers/utils/import_utils.py:2317:16\u2026", - "changed_files": 0, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, + "additions": 79, + "author": "ManasVardhan", + "author_association": "NONE", + "body_excerpt": "## What does this PR do? Refactors the `swin` model to use the standardized output collection interface (`@capture_outputs` and `@can_return_tuple` decorators), as described in #43979. ### Changes **SwinPreTrainedModel:** - Added `_can_rec\u2026", + "changed_files": 2, + "cluster_id": "cluster-43979-28", + "cluster_ids": [ + "cluster-43979-28" + ], + "cluster_role": "member", "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/43709", - "created_at": "2026-02-03T14:26:58Z", - "deletions": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/44011", + "created_at": "2026-02-15T11:11:02Z", + "deletions": 146, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43709/files", - "html_url": "https://github.com/huggingface/transformers/pull/43709", + "files_url": "https://github.com/huggingface/transformers/pull/44011/files", + "html_url": "https://github.com/huggingface/transformers/pull/44011", "labels": [], - "merged": true, - "number": 43709, + "merged": false, + "number": 44011, "review_comments_count": 0, "state": "closed", - "title": "fix: `VersionComparison.from_string` return type mismatch", - "updated_at": "2026-02-23T19:05:33Z" + "title": "Refactor Swin output tracing with @capture_outputs and @can_return_tuple", + "updated_at": "2026-02-17T14:15:17Z" }, { - "additions": 2202, - "author": "liu-jiaxuan", + "additions": 41, + "author": "preetam1407", "author_association": "CONTRIBUTOR", - "body_excerpt": "# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingfa\u2026", - "changed_files": 16, - "cluster_id": "cluster-43098-11", + "body_excerpt": "#43979. Refactors SqueezeBert to the standardized output collection interface: - Adds `_can_record_outputs` in `SqueezeBertPreTrainedModel` - Adds `@capture_outputs` on `SqueezeBertModel.forward` - Adds `@can_return_tuple` on task model fo\u2026", + "changed_files": 1, + "cluster_id": "cluster-43979-28", "cluster_ids": [ - "cluster-43098-11" + "cluster-43979-28" ], "cluster_role": "member", - "comments_count": 11, - "conversation_url": "https://github.com/huggingface/transformers/pull/43707", - "created_at": "2026-02-03T13:33:41Z", - "deletions": 0, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44010", + "created_at": "2026-02-15T09:40:09Z", + "deletions": 139, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43707/files", - "html_url": "https://github.com/huggingface/transformers/pull/43707", - "labels": [ - "New model" - ], - "merged": true, - "number": 43707, - "review_comments_count": 145, - "state": "closed", - "title": "[Model] Add SLANeXt Model Support", - "updated_at": "2026-03-20T17:24:22Z" + "files_url": "https://github.com/huggingface/transformers/pull/44010/files", + "html_url": "https://github.com/huggingface/transformers/pull/44010", + "labels": [], + "merged": false, + "number": 44010, + "review_comments_count": 2, + "state": "open", + "title": "[SqueezeBert] Migrate to standardized output collection decorators", + "updated_at": "2026-03-02T13:04:52Z" }, { - "additions": 42, - "author": "vasqu", - "author_association": "MEMBER", - "body_excerpt": "As per title, the new way to call the attention interface has slipped through a refactor because it's too new and not too well known atm cc @yonigozlan", - "changed_files": 9, + "additions": 1, + "author": "mariam851", + "author_association": "CONTRIBUTOR", + "body_excerpt": "Fixes #43976 Updated the documentation to reflect the actual Python requirement (3.10+) as defined in setup.py. Changes: Updated README.md .", + "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/43706", - "created_at": "2026-02-03T11:57:22Z", - "deletions": 48, + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44009", + "created_at": "2026-02-15T08:51:26Z", + "deletions": 1, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43706/files", - "html_url": "https://github.com/huggingface/transformers/pull/43706", + "files_url": "https://github.com/huggingface/transformers/pull/44009/files", + "html_url": "https://github.com/huggingface/transformers/pull/44009", "labels": [], "merged": true, - "number": 43706, - "review_comments_count": 2, + "number": 44009, + "review_comments_count": 0, "state": "closed", - "title": "[`Attn`] Fixup interface usage after refactor", - "updated_at": "2026-02-03T14:56:35Z" + "title": "update python requirement to 3.10+ to match codebase", + "updated_at": "2026-02-16T13:46:56Z" }, { - "additions": 120, - "author": "Cyrilvallez", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Allow the `is_causal` kwarg and config attribute to make well-behaved decoder-only models act as encoders", + "additions": 26, + "author": "pdwi2020", + "author_association": "FIRST_TIME_CONTRIBUTOR", + "body_excerpt": "## Summary - refactor `ResNetModel` to use `@capture_outputs` for hidden-state collection - register `_can_record_outputs` on `ResNetPreTrainedModel` with `ResNetStage` - switch `ResNetForImageClassification` and `ResNetBackbone` to `@can_\u2026", "changed_files": 3, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43705", - "created_at": "2026-02-03T11:45:43Z", - "deletions": 0, + "cluster_id": "cluster-43979-28", + "cluster_ids": [ + "cluster-43979-28" + ], + "cluster_role": "member", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44007", + "created_at": "2026-02-15T07:26:52Z", + "deletions": 58, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43705/files", - "html_url": "https://github.com/huggingface/transformers/pull/43705", + "files_url": "https://github.com/huggingface/transformers/pull/44007/files", + "html_url": "https://github.com/huggingface/transformers/pull/44007", "labels": [], - "merged": true, - "number": 43705, - "review_comments_count": 11, - "state": "closed", - "title": "Allow bi-directional attention for all models", - "updated_at": "2026-02-04T17:24:32Z" + "merged": false, + "number": 44007, + "review_comments_count": 0, + "state": "open", + "title": "[ResNet] Refactor output tracing to decorator-based interface", + "updated_at": "2026-02-19T15:49:49Z" }, { - "additions": 1, - "author": "francesco-bertolotti", + "additions": 8, + "author": "cyyever", "author_association": "CONTRIBUTOR", - "body_excerpt": "wrong `rms_norm_type` # What does this PR do? Small type error in the configuration of qwen3. `rms_norm_eps` should be a float and not an int. ## Before submitting - [ X] This PR fixes a typo or improves the docs (you can dismiss the other\u2026", - "changed_files": 1, + "body_excerpt": "# What does this PR do? This PR uses torch.xlogy for better numerical handling.", + "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, - "conversation_url": "https://github.com/huggingface/transformers/pull/43703", - "created_at": "2026-02-03T10:05:17Z", - "deletions": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44006", + "created_at": "2026-02-15T04:07:50Z", + "deletions": 8, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43703/files", - "html_url": "https://github.com/huggingface/transformers/pull/43703", + "files_url": "https://github.com/huggingface/transformers/pull/44006/files", + "html_url": "https://github.com/huggingface/transformers/pull/44006", "labels": [], "merged": true, - "number": 43703, + "number": 44006, "review_comments_count": 0, "state": "closed", - "title": "Update configuration_qwen3.py", - "updated_at": "2026-02-04T07:03:04Z" + "title": "Use torch.xlogy ", + "updated_at": "2026-02-17T00:42:54Z" }, { - "additions": 2828, - "author": "eustlb", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? Adds[ UsefulSensors'](https://huggingface.co/UsefulSensors) new ASR model.", - "changed_files": 19, + "additions": 224, + "author": "cyyever", + "author_association": "CONTRIBUTOR", + "body_excerpt": "# What does this PR do? This PR transfers grid_thw to a python list at the beginning of some functions to reduce later CUDA sync calls. Therefore, several sync calls are merged into one call.", + "changed_files": 16, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/43702", - "created_at": "2026-02-03T09:32:42Z", - "deletions": 247, + "conversation_url": "https://github.com/huggingface/transformers/pull/44005", + "created_at": "2026-02-15T02:34:55Z", + "deletions": 254, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43702/files", - "html_url": "https://github.com/huggingface/transformers/pull/43702", - "labels": [ - "New model" - ], + "files_url": "https://github.com/huggingface/transformers/pull/44005/files", + "html_url": "https://github.com/huggingface/transformers/pull/44005", + "labels": [], "merged": true, - "number": 43702, - "review_comments_count": 30, + "number": 44005, + "review_comments_count": 1, "state": "closed", - "title": "Add moonshine streaming", - "updated_at": "2026-02-12T10:10:16Z" + "title": "Reduce reduce CUDA sync", + "updated_at": "2026-02-17T01:00:52Z" }, { - "additions": 1, - "author": "YangKai0616", + "additions": 21, + "author": "omkar-334", "author_association": "CONTRIBUTOR", - "body_excerpt": "Here pytorch has a mature mechanism to auto select the right backend for different devices. @ydshieh pls help review, thx!", + "body_excerpt": "This PR refactors the `codegen` model as per #43979 cc @molbap \"Screenshot 2 tests are bei\u2026", "changed_files": 1, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 6, - "conversation_url": "https://github.com/huggingface/transformers/pull/43699", - "created_at": "2026-02-03T07:33:04Z", - "deletions": 1, + "cluster_id": "cluster-43998-11", + "cluster_ids": [ + "cluster-43998-11" + ], + "cluster_role": "member", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44004", + "created_at": "2026-02-14T23:56:18Z", + "deletions": 62, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43699/files", - "html_url": "https://github.com/huggingface/transformers/pull/43699", + "files_url": "https://github.com/huggingface/transformers/pull/44004/files", + "html_url": "https://github.com/huggingface/transformers/pull/44004", "labels": [], "merged": false, - "number": 43699, - "review_comments_count": 3, - "state": "closed", - "title": "avoid using specified backend for tp tests", - "updated_at": "2026-03-09T08:17:48Z" + "number": 44004, + "review_comments_count": 0, + "state": "open", + "title": "refactor output tracing for `codegen`", + "updated_at": "2026-02-17T08:56:07Z" }, { - "additions": 1, - "author": "sywangyi", + "additions": 37, + "author": "omkar-334", "author_association": "CONTRIBUTOR", - "body_excerpt": "- model loading (from pretrained, etc): @CyrilVallez - distributed: @3outeille @ArthurZucker fix tp crash. crash stack is [rank0]: Traceback (most recent call last): [rank0]: File \"/transformers/benchmark_v2/test_tp.py\", line 29, in Note - Only 46 te\u2026", + "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/43695", - "created_at": "2026-02-03T01:30:55Z", - "deletions": 1, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44003", + "created_at": "2026-02-14T23:46:10Z", + "deletions": 68, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43695/files", - "html_url": "https://github.com/huggingface/transformers/pull/43695", + "files_url": "https://github.com/huggingface/transformers/pull/44003/files", + "html_url": "https://github.com/huggingface/transformers/pull/44003", "labels": [], - "merged": true, - "number": 43695, + "merged": false, + "number": 44003, "review_comments_count": 0, - "state": "closed", - "title": "fix gptoss tp crash", - "updated_at": "2026-02-03T10:20:30Z" + "state": "open", + "title": "refactor output tracing in `mamba`", + "updated_at": "2026-02-17T07:40:50Z" }, { - "additions": 1, - "author": "stevhliu", - "author_association": "MEMBER", - "body_excerpt": "updates link to benchmark's new location", + "additions": 7, + "author": "omkar-334", + "author_association": "CONTRIBUTOR", + "body_excerpt": "This PR refactors the `upernet` model as per #43979 cc @molbap \"Screenshot", "changed_files": 1, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43694", - "created_at": "2026-02-03T01:21:15Z", - "deletions": 1, + "cluster_id": "cluster-43998-11", + "cluster_ids": [ + "cluster-43998-11" + ], + "cluster_role": "member", + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/44002", + "created_at": "2026-02-14T23:21:45Z", + "deletions": 20, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43694/files", - "html_url": "https://github.com/huggingface/transformers/pull/43694", + "files_url": "https://github.com/huggingface/transformers/pull/44002/files", + "html_url": "https://github.com/huggingface/transformers/pull/44002", "labels": [], - "merged": true, - "number": 43694, + "merged": false, + "number": 44002, "review_comments_count": 0, - "state": "closed", - "title": "[docs] benchmarks", - "updated_at": "2026-02-03T17:00:13Z" + "state": "open", + "title": "refactor output tracing in `upernet`", + "updated_at": "2026-02-17T08:55:16Z" }, { - "additions": 1, - "author": "WilliamRoyNelson", - "author_association": "NONE", - "body_excerpt": "# Update doc preprocessing regex to prevent ReDoS The regular expression for capturing docstrings is vulnerable to a [ReDoS attack](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) The previous change d\u2026", + "additions": 3, + "author": "omkar-334", + "author_association": "CONTRIBUTOR", + "body_excerpt": "This PR refactors the`univnet` model as per #43979 cc @molbap \"Screenshot", "changed_files": 1, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/43693", - "created_at": "2026-02-03T01:06:06Z", - "deletions": 1, + "cluster_id": "cluster-43998-11", + "cluster_ids": [ + "cluster-43998-11" + ], + "cluster_role": "member", + "comments_count": 1, + "conversation_url": "https://github.com/huggingface/transformers/pull/44001", + "created_at": "2026-02-14T22:50:39Z", + "deletions": 9, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43693/files", - "html_url": "https://github.com/huggingface/transformers/pull/43693", + "files_url": "https://github.com/huggingface/transformers/pull/44001/files", + "html_url": "https://github.com/huggingface/transformers/pull/44001", "labels": [], "merged": false, - "number": 43693, + "number": 44001, "review_comments_count": 0, - "state": "closed", - "title": "Update doc preprocessing regex to prevent ReDoS", - "updated_at": "2026-02-03T17:23:59Z" + "state": "open", + "title": "refactor output tracing in `univnet`", + "updated_at": "2026-02-14T23:22:13Z" }, { - "additions": 13, - "author": "qgallouedec", - "author_association": "MEMBER", - "body_excerpt": "## Summary On PyTorch 2.10+, `Trainer.train()` crashes at the first `lr_scheduler.step()` when using DeepSpeed ZeRO-3 with a PEFT model. This PR provides fix, alothough I'm sure it's not the ideal one. The failure only appears with torch 2\u2026", + "additions": 8, + "author": "omkar-334", + "author_association": "CONTRIBUTOR", + "body_excerpt": "This PR refactors the `vision_text_dual_encoder` model issue as per #43979 cc @molbap \"Screenshot", "changed_files": 1, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, + "cluster_id": "cluster-43998-11", + "cluster_ids": [ + "cluster-43998-11" + ], + "cluster_role": "member", "comments_count": 1, - "conversation_url": "https://github.com/huggingface/transformers/pull/43689", - "created_at": "2026-02-02T16:18:36Z", - "deletions": 0, + "conversation_url": "https://github.com/huggingface/transformers/pull/43998", + "created_at": "2026-02-14T22:12:30Z", + "deletions": 19, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43689/files", - "html_url": "https://github.com/huggingface/transformers/pull/43689", + "files_url": "https://github.com/huggingface/transformers/pull/43998/files", + "html_url": "https://github.com/huggingface/transformers/pull/43998", "labels": [], - "merged": true, - "number": 43689, + "merged": false, + "number": 43998, "review_comments_count": 0, - "state": "closed", - "title": "update guide with new attr name for toks", - "updated_at": "2026-02-02T21:04:22Z" - }, - { - "additions": 164, - "author": "tarekziade", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? - adds Bandit's S110 that makes sure we don't have a dry `Except` - fixes all occurrences - mark a couple of spots where we could tighten the `Exception` catch all I focused on making changes under `src/transformers\u2026", - "changed_files": 18, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 5, - "conversation_url": "https://github.com/huggingface/transformers/pull/43687", - "created_at": "2026-02-02T15:29:48Z", - "deletions": 150, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43687/files", - "html_url": "https://github.com/huggingface/transformers/pull/43687", - "labels": [], - "merged": true, - "number": 43687, - "review_comments_count": 37, - "state": "closed", - "title": "Added S110 - try-except-pass rule", - "updated_at": "2026-02-03T21:20:36Z" + "state": "open", + "title": "refactor output tracing in `timm_backbone`", + "updated_at": "2026-02-21T07:29:47Z" }, { - "additions": 1, - "author": "jianchang512", - "author_association": "NONE", - "body_excerpt": "Tokenization should be performed on the source language, i.e., `fi_text`. # What does this PR do? makes the whole mixin behave like a static holder for methods... - Modify methods/inherited cl\u2026", - "changed_files": 137, - "cluster_id": null, - "cluster_ids": [], - "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/43620", - "created_at": "2026-01-30T11:24:09Z", - "deletions": 288, + "additions": 47, + "author": "kaixuanliu", + "author_association": "CONTRIBUTOR", + "body_excerpt": "@ydshieh , pls help review, thx!", + "changed_files": 3, + "cluster_id": "cluster-43324-21", + "cluster_ids": [ + "cluster-43324-21" + ], + "cluster_role": "canonical", + "comments_count": 6, + "conversation_url": "https://github.com/huggingface/transformers/pull/43936", + "created_at": "2026-02-12T08:34:03Z", + "deletions": 19, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43620/files", - "html_url": "https://github.com/huggingface/transformers/pull/43620", + "files_url": "https://github.com/huggingface/transformers/pull/43936/files", + "html_url": "https://github.com/huggingface/transformers/pull/43936", "labels": [], "merged": true, - "number": 43620, - "review_comments_count": 0, + "number": 43936, + "review_comments_count": 13, "state": "closed", - "title": "[`Rope`] Revert #43410 and make inheritance implicit again", - "updated_at": "2026-01-30T18:44:16Z" + "title": "Fix failed unit tests for moonshine_streaming model", + "updated_at": "2026-03-06T07:39:09Z" }, { - "additions": 40, - "author": "zucchini-nlp", - "author_association": "MEMBER", - "body_excerpt": "# What does this PR do? As per title, some models add or delete entries in tied weights depending on configuration. If we load two models consecutively with different configs, it fails to tie weights correctly I am copying it in `__init__`\u2026", - "changed_files": 4, + "additions": 1245, + "author": "MekkCyber", + "author_association": "CONTRIBUTOR", + "body_excerpt": "# What does this PR do? Adds mlx quantization for mps devices leveraging the `kernels` library for pre-built kernels !!", + "changed_files": 13, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 4, - "conversation_url": "https://github.com/huggingface/transformers/pull/43619", - "created_at": "2026-01-30T10:43:38Z", - "deletions": 6, + "comments_count": 2, + "conversation_url": "https://github.com/huggingface/transformers/pull/43934", + "created_at": "2026-02-12T07:59:02Z", + "deletions": 4, "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43619/files", - "html_url": "https://github.com/huggingface/transformers/pull/43619", - "labels": [ - "for patch" - ], + "files_url": "https://github.com/huggingface/transformers/pull/43934/files", + "html_url": "https://github.com/huggingface/transformers/pull/43934", + "labels": [], "merged": true, - "number": 43619, - "review_comments_count": 8, + "number": 43934, + "review_comments_count": 20, "state": "closed", - "title": "Don't modify `tied_weight_keys` in-place", - "updated_at": "2026-01-30T15:46:02Z" + "title": "[Quantization] Add metal quantization for MPS devices!", + "updated_at": "2026-02-27T13:28:31Z" }, { - "additions": 17, - "author": "kaixuanliu", + "additions": 66, + "author": "quic-meetkuma", "author_association": "CONTRIBUTOR", - "body_excerpt": "@zucchini-nlp pls help review, thx! We have to add back the changes in https://github.com/huggingface/transformers/pull/42523. As for llava_onevision model, in its checkpoint config file, the model's `tie_word_embeddings` is Flase, and mod\u2026", - "changed_files": 3, + "body_excerpt": "# What does this PR do? This PR adds hardware backend called \"qaic\" which is for Qualcomm's AI Accelerator. The inclusion is similar to any other hardware backend in the Trainer. With this the user will be able to use Qualcomm's AI Acceler\u2026", + "changed_files": 9, "cluster_id": null, "cluster_ids": [], "cluster_role": null, - "comments_count": 3, - "conversation_url": "https://github.com/huggingface/transformers/pull/43617", - "created_at": "2026-01-30T10:21:45Z", - "deletions": 0, - "draft": false, - "files_url": "https://github.com/huggingface/transformers/pull/43617/files", - "html_url": "https://github.com/huggingface/transformers/pull/43617", + "comments_count": 4, + "conversation_url": "https://github.com/huggingface/transformers/pull/43933", + "created_at": "2026-02-12T06:14:52Z", + "deletions": 2, + "draft": true, + "files_url": "https://github.com/huggingface/transformers/pull/43933/files", + "html_url": "https://github.com/huggingface/transformers/pull/43933", "labels": [], "merged": false, - "number": 43617, + "number": 43933, "review_comments_count": 0, "state": "closed", - "title": "Fix tie_word_embedding issue for llava_onevision model", - "updated_at": "2026-01-30T14:33:39Z" + "title": "Added support for qaic backend for Qualcomm's AI Accelerator", + "updated_at": "2026-02-17T16:53:38Z" }, { "additions": 3, - "author": "yiliu30", + "author": "quic-meetkuma", "author_association": "CONTRIBUTOR", - "body_excerpt": "Signed-off-by: yiliu30 # What does this PR do? ## Related Issue Fixes #43408 **Issue:** Warning: You are using a model of type sam3_video to instantiate a model of type sam3_tracker **URL:** https://github.com/huggingface/transformers/\u2026", - "changed_files": 8, + "body_excerpt": "# What does this PR do?