Spaces:

transformers-community
/

Transformers-tenets

Running

App Files Files Community

Molbap HF Staff commited on Aug 20

Commit

9a9aeb4

verified ·

1 Parent(s): c073d08

Update content/article.md

Browse files

Files changed (1) hide show

content/article.md +19 -19

content/article.md CHANGED Viewed

@@ -1,34 +1,34 @@
-#transformers  #huggingface
-## Digging through tenets and time
-### Introduction
-###context The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models. The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba/RWKV.  Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained`. Inference and training are supported. The library supports ML courses, cookbooks, and several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
-###tension The ML wave has not stopped, there's more and more models being added. `Transformers` is widely used, and we read the feedback that users post. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
-###scope Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
-###promise Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the `transformers` code base, how to use it better, how to meaningfully contribute to it.
 This will also showcase new features you might have missed so you'll be up-to-date.
 So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library.  They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
--  0.  <a id="source-of-truth"></a>overarching "Guideline": we should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
-- 1. <a id="one-model-one-file"></a> One model, one file: all inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.
-- 2. <a id="code-is-product"></a>Code is the product: optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
-- 3. <a id="standardize-dont-abstract"></a>Standardize, don’t abstract: if it’s model behavior, keep it in the file; abstractions only for generic infra.
-- 4. ###TOCHANGE   <a id="do-repeat-yourself"></a>DRY* (DO Repeat Yourself) via the copy mechanism: copy when it helps users; keep successors in sync without centralizing behavior.
-	- 4 prime: We amend this tenet. With the introduction and global adoption of  [`modular`](#modular) transformers, we do not repeat any logic in the `modular` files, but end user files remain faithful to the original tenet.
-- 5. <a id="minimal-user-api"></a>Minimal user API: config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
-- 6. <a id="backwards-compatibility"></a>Backwards compatibility first: evolve by additive standardization, **never** break public APIs.
 	- Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
-- 7. ###TOCHANGE <a id="consistent-public-surface"></a>Consistent public surface, enforced by tests: same argument names, same outputs, hidden states and attentions exposed.
-	-
-- 8. ###TOCHANGE  We are not a modular toolbox. Components should be separable and users encouraged to use PyTorch directly for further usage.
-	- This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as  PEFT/TRL/SGLang/vLLM.
 When a PR is merged, it is because the contribution is worthwhile, and that the  `transformers` team finds the design of the contribution to be aligned with what is above.

+# Digging through tenets and time
+## Introduction
+The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models. The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba/RWKV.  Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained`. Inference and training are supported. The library supports ML courses, cookbooks, and several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is open-source and has been written by the community for a large part.
+The ML wave has not stopped, there's more and more models being added. `Transformers` is widely used, and we read the feedback that users post. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
+Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
+### What you will learn
+Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the `transformers` code base, how to use it better, how to meaningfully contribute to it.
 This will also showcase new features you might have missed so you'll be up-to-date.
 So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library.  They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
+0.  <a id="source-of-truth"></a>overarching "Guideline": we should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
+1. <a id="one-model-one-file"></a> One model, one file: all inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.
+2. <a id="code-is-product"></a>Code is the product: optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
+3. <a id="standardize-dont-abstract"></a>Standardize, don’t abstract: if it’s model behavior, keep it in the file; abstractions only for generic infra.
+4. ###TOCHANGE   <a id="do-repeat-yourself"></a>DRY* (DO Repeat Yourself) via the copy mechanism: copy when it helps users; keep successors in sync without centralizing behavior.
+ -We amend this tenet. With the introduction and global adoption of  [`modular`](#modular) transformers, we do not repeat any logic in the `modular` files, but end user files remain faithful to the original tenet.
+5. <a id="minimal-user-api"></a>Minimal user API: config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
+7. 6. <a id="backwards-compatibility"></a>Backwards compatibility first: evolve by additive standardization, **never** break public APIs.
 	- Some models are showing almost no use, we also stopped adding new features for non-`torch` frameworks. Still, we adapt to models existing on the hub.
+8. ###TOCHANGE <a id="consistent-public-surface"></a>Consistent public surface, enforced by tests: same argument names, same outputs, hidden states and attentions exposed.
+9. ###TOCHANGE  We are not a modular toolbox. Components should be separable and users encouraged to use PyTorch directly for further usage.
+- This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is _better_ for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as  PEFT/TRL/SGLang/vLLM.
 When a PR is merged, it is because the contribution is worthwhile, and that the  `transformers` team finds the design of the contribution to be aligned with what is above.