SEBIS
/

code_trans_t5_small_program_synthese_transfer_learning_finetune

Summarization

Transformers

PyTorch

TensorFlow

JAX

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

agemagician

Ezi commited on Jun 27, 2022

Commit

cf3d414

1 Parent(s): bdaa33f

Limitations & Biases: Model Card Update (#3)

Browse files

- Limitations & Biases: Model Card Update (f97f3e35abf5ae248c9829233e15817945036b94)

Co-authored-by: Ezi Ozoani <Ezi@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -60,6 +60,19 @@ The supervised training tasks datasets can be downloaded on [Link](https://www.d
 The model could be used to generate lisp inspired DSL code given the human language description tasks.
 ## Training

 The model could be used to generate lisp inspired DSL code given the human language description tasks.
+## Risks, Limitations and Biases
+As detailed in this model’s [publication](https://arxiv.org/pdf/2104.02443.pdf), this model makes use of the data-set [One Billion Word Language Model Benchmark corpus](https://www.researchgate.net/publication/259239818_One_Billion_Word_Benchmark_for_Measuring_Progress_in_Statistical_Language_Modeling) in order to gather the self-supervised English data samples.
+Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
+As such, it should be noted that language models that are pretrained from text corpus such as the One Billion Word Word Language Model Benchmark corpus have been further explored (e.g by  [Ngo, Helen & Araújo et al(2021)](https://www.researchgate.net/publication/355582954_No_News_is_Good_News_A_Critique_of_the_One_Billion_Word_Benchmark) reports that the One Billion Word Word Language Model Benchmark corpus
+> “generate text in the linguistic style of news, without any grounding in the real world. In addition to potential harms from models which are inadvertently optimized for generating fake news.”
+The aforementioned publication continues to warn that the One Billion Word Word Language Model Benchmark corpus
+> contains sentences which contain words commonly found on blocklists. While these sentences may have plausibly been used in expository contexts within the article, the destructive sentence-level preprocessing and shuffling applied to lm1b [One Billion Word Word Language Model Benchmark corpus] removes all long-range structure from the text and makes it infeasible to track the context and intent of individual examples.
+[Ngo, Helen & Araújo et al(2021)](https://www.researchgate.net/publication/355582954_No_News_is_Good_News_A_Critique_of_the_One_Billion_Word_Benchmark)
 ## Training