Molbap HF Staff commited on
Commit
9a9aeb4
·
verified ·
1 Parent(s): c073d08

Update content/article.md

Browse files
Files changed (1) hide show
  1. content/article.md +19 -19
content/article.md CHANGED
@@ -1,34 +1,34 @@
1
- #transformers #huggingface
2
- ## Digging through tenets and time
3
 
4
 
5
- ### Introduction
6
 
7
- ###context The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models. The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba/RWKV. Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained`. Inference and training are supported. The library supports ML courses, cookbooks, and several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
8
 
9
- ###tension The ML wave has not stopped, there's more and more models being added. `Transformers` is widely used, and we read the feedback that users post. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
10
 
11
- ###scope Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
12
 
13
- ###promise Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the `transformers` code base, how to use it better, how to meaningfully contribute to it.
 
 
14
  This will also showcase new features you might have missed so you'll be up-to-date.
15
 
16
  So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
17
 
18
- - 0. <a id="source-of-truth"></a>overarching "Guideline": we should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
19
 
20
- - 1. <a id="one-model-one-file"></a> One model, one file: all inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.
21
- - 2. <a id="code-is-product"></a>Code is the product: optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
22
- - 3. <a id="standardize-dont-abstract"></a>Standardize, don’t abstract: if it’s model behavior, keep it in the file; abstractions only for generic infra.
23
- - 4. ###TOCHANGE <a id="do-repeat-yourself"></a>DRY* (DO Repeat Yourself) via the copy mechanism: copy when it helps users; keep successors in sync without centralizing behavior.
24
- - 4 prime: We amend this tenet. With the introduction and global adoption of [`modular`](#modular) transformers, we do not repeat any logic in the `modular` files, but end user files remain faithful to the original tenet.
25
- - 5. <a id="minimal-user-api"></a>Minimal user API: config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
26
- - 6. <a id="backwards-compatibility"></a>Backwards compatibility first: evolve by additive standardization, **never** break public APIs.
27
  - Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
28
- - 7. ###TOCHANGE <a id="consistent-public-surface"></a>Consistent public surface, enforced by tests: same argument names, same outputs, hidden states and attentions exposed.
29
- -
30
- - 8. ###TOCHANGE We are not a modular toolbox. Components should be separable and users encouraged to use PyTorch directly for further usage.
31
- - This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.
32
 
33
 
34
  When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
 
1
+ # Digging through tenets and time
 
2
 
3
 
4
+ ## Introduction
5
 
6
+ The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models. The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba/RWKV. Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained`. Inference and training are supported. The library supports ML courses, cookbooks, and several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
7
 
8
+ The ML wave has not stopped, there's more and more models being added. `Transformers` is widely used, and we read the feedback that users post. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
9
 
10
+ Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
11
 
12
+ ### What you will learn
13
+
14
+ Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the `transformers` code base, how to use it better, how to meaningfully contribute to it.
15
  This will also showcase new features you might have missed so you'll be up-to-date.
16
 
17
  So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
18
 
19
+ 0. <a id="source-of-truth"></a>overarching "Guideline": we should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
20
 
21
+ 1. <a id="one-model-one-file"></a> One model, one file: all inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.
22
+ 2. <a id="code-is-product"></a>Code is the product: optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
23
+ 3. <a id="standardize-dont-abstract"></a>Standardize, don’t abstract: if it’s model behavior, keep it in the file; abstractions only for generic infra.
24
+ 4. ###TOCHANGE <a id="do-repeat-yourself"></a>DRY* (DO Repeat Yourself) via the copy mechanism: copy when it helps users; keep successors in sync without centralizing behavior.
25
+ -We amend this tenet. With the introduction and global adoption of [`modular`](#modular) transformers, we do not repeat any logic in the `modular` files, but end user files remain faithful to the original tenet.
26
+ 5. <a id="minimal-user-api"></a>Minimal user API: config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
27
+ 7. 6. <a id="backwards-compatibility"></a>Backwards compatibility first: evolve by additive standardization, **never** break public APIs.
28
  - Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
29
+ 8. ###TOCHANGE <a id="consistent-public-surface"></a>Consistent public surface, enforced by tests: same argument names, same outputs, hidden states and attentions exposed.
30
+ 9. ###TOCHANGE We are not a modular toolbox. Components should be separable and users encouraged to use PyTorch directly for further usage.
31
+ - This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.
 
32
 
33
 
34
  When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.