Update content/article.md
Browse files- content/article.md +19 -19
content/article.md
CHANGED
|
@@ -1,34 +1,34 @@
|
|
| 1 |
-
#
|
| 2 |
-
## Digging through tenets and time
|
| 3 |
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
###
|
|
|
|
|
|
|
| 14 |
This will also showcase new features you might have missed so you'll be up-to-date.
|
| 15 |
|
| 16 |
So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
- Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
-
|
| 31 |
-
- This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.
|
| 32 |
|
| 33 |
|
| 34 |
When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
|
|
|
|
| 1 |
+
# Digging through tenets and time
|
|
|
|
| 2 |
|
| 3 |
|
| 4 |
+
## Introduction
|
| 5 |
|
| 6 |
+
The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models. The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba/RWKV. Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained`. Inference and training are supported. The library supports ML courses, cookbooks, and several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
|
| 7 |
|
| 8 |
+
The ML wave has not stopped, there's more and more models being added. `Transformers` is widely used, and we read the feedback that users post. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
|
| 9 |
|
| 10 |
+
Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
|
| 11 |
|
| 12 |
+
### What you will learn
|
| 13 |
+
|
| 14 |
+
Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the `transformers` code base, how to use it better, how to meaningfully contribute to it.
|
| 15 |
This will also showcase new features you might have missed so you'll be up-to-date.
|
| 16 |
|
| 17 |
So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
|
| 18 |
|
| 19 |
+
0. <a id="source-of-truth"></a>overarching "Guideline": we should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
|
| 20 |
|
| 21 |
+
1. <a id="one-model-one-file"></a> One model, one file: all inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.
|
| 22 |
+
2. <a id="code-is-product"></a>Code is the product: optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
|
| 23 |
+
3. <a id="standardize-dont-abstract"></a>Standardize, don’t abstract: if it’s model behavior, keep it in the file; abstractions only for generic infra.
|
| 24 |
+
4. ###TOCHANGE <a id="do-repeat-yourself"></a>DRY* (DO Repeat Yourself) via the copy mechanism: copy when it helps users; keep successors in sync without centralizing behavior.
|
| 25 |
+
-We amend this tenet. With the introduction and global adoption of [`modular`](#modular) transformers, we do not repeat any logic in the `modular` files, but end user files remain faithful to the original tenet.
|
| 26 |
+
5. <a id="minimal-user-api"></a>Minimal user API: config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
|
| 27 |
+
7. 6. <a id="backwards-compatibility"></a>Backwards compatibility first: evolve by additive standardization, **never** break public APIs.
|
| 28 |
- Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
|
| 29 |
+
8. ###TOCHANGE <a id="consistent-public-surface"></a>Consistent public surface, enforced by tests: same argument names, same outputs, hidden states and attentions exposed.
|
| 30 |
+
9. ###TOCHANGE We are not a modular toolbox. Components should be separable and users encouraged to use PyTorch directly for further usage.
|
| 31 |
+
- This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
|