cahya commited on
Commit
5e5fa4f
1 Parent(s): 1545f56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -13,13 +13,13 @@ and first released at [this page](https://openai.com/blog/better-language-models
13
  This model was trained using HuggingFace's Flax framework and is part of the [JAX/Flax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
14
  organized by [HuggingFace](https://huggingface.co). All training was done on a TPUv3-8 VM sponsored by the Google Cloud team.
15
 
16
- The demo can be found [here](https://huggingface.co/spaces/flax-community/gpt2-indonesian).
17
 
18
  ## How to use
19
  You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
20
  ```python
21
  >>> from transformers import pipeline, set_seed
22
- >>> generator = pipeline('text-generation', model='flax-community/gpt2-medium-indonesian')
23
  >>> set_seed(42)
24
  >>> generator("Sewindu sudah kita tak berjumpa,", max_length=30, num_return_sequences=5)
25
 
@@ -35,8 +35,8 @@ Tuhan akan memberi lebih dari apa yang kita'}]
35
  Here is how to use this model to get the features of a given text in PyTorch:
36
  ```python
37
  from transformers import GPT2Tokenizer, GPT2Model
38
- tokenizer = GPT2Tokenizer.from_pretrained('flax-community/gpt2-medium-indonesian')
39
- model = GPT2Model.from_pretrained('flax-community/gpt2-medium-indonesian')
40
  text = "Ubah dengan teks apa saja."
41
  encoded_input = tokenizer(text, return_tensors='pt')
42
  output = model(**encoded_input)
@@ -45,8 +45,8 @@ output = model(**encoded_input)
45
  and in TensorFlow:
46
  ```python
47
  from transformers import GPT2Tokenizer, TFGPT2Model
48
- tokenizer = GPT2Tokenizer.from_pretrained('flax-community/gpt2-medium-indonesian')
49
- model = TFGPT2Model.from_pretrained('flax-community/gpt2-medium-indonesian')
50
  text = "Ubah dengan teks apa saja."
51
  encoded_input = tokenizer(text, return_tensors='tf')
52
  output = model(encoded_input)
@@ -70,7 +70,7 @@ As the openAI team themselves point out in their [model card](https://github.com
70
  > race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with
71
  > similar levels of caution around use cases that are sensitive to biases around human attributes.
72
 
73
- We have done a basic bias analysis that you can find in this [notebook](https://huggingface.co/flax-community/gpt2-small-indonesian/blob/main/bias_analysis/gpt2_medium_indonesian_bias_analysis.ipynb), performed on [Indonesian GPT2 medium](https://huggingface.co/flax-community/gpt2-medium-indonesian), based on the bias analysis for [Polish GPT2](https://huggingface.co/flax-community/papuGaPT2) with modifications.
74
 
75
  ### Gender bias
76
  We generated 50 texts starting with prompts "She/He works as". After doing some preprocessing (lowercase and stopwords removal) we obtain texts that are used to generate word clouds of female/male professions. The most salient terms for male professions are: driver, sopir (driver), ojek, tukang, online.
13
  This model was trained using HuggingFace's Flax framework and is part of the [JAX/Flax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
14
  organized by [HuggingFace](https://huggingface.co). All training was done on a TPUv3-8 VM sponsored by the Google Cloud team.
15
 
16
+ The demo can be found [here](https://huggingface.co/spaces/indonesian-nlp/gpt2-app).
17
 
18
  ## How to use
19
  You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
20
  ```python
21
  >>> from transformers import pipeline, set_seed
22
+ >>> generator = pipeline('text-generation', model='indonesian-nlp/gpt2-medium-indonesian')
23
  >>> set_seed(42)
24
  >>> generator("Sewindu sudah kita tak berjumpa,", max_length=30, num_return_sequences=5)
25
 
35
  Here is how to use this model to get the features of a given text in PyTorch:
36
  ```python
37
  from transformers import GPT2Tokenizer, GPT2Model
38
+ tokenizer = GPT2Tokenizer.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
39
+ model = GPT2Model.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
40
  text = "Ubah dengan teks apa saja."
41
  encoded_input = tokenizer(text, return_tensors='pt')
42
  output = model(**encoded_input)
45
  and in TensorFlow:
46
  ```python
47
  from transformers import GPT2Tokenizer, TFGPT2Model
48
+ tokenizer = GPT2Tokenizer.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
49
+ model = TFGPT2Model.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
50
  text = "Ubah dengan teks apa saja."
51
  encoded_input = tokenizer(text, return_tensors='tf')
52
  output = model(encoded_input)
70
  > race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with
71
  > similar levels of caution around use cases that are sensitive to biases around human attributes.
72
 
73
+ We have done a basic bias analysis that you can find in this [notebook](https://huggingface.co/indonesian-nlp/gpt2-small-indonesian/blob/main/bias_analysis/gpt2_medium_indonesian_bias_analysis.ipynb), performed on [Indonesian GPT2 medium](https://huggingface.co/indonesian-nlp/gpt2-medium-indonesian), based on the bias analysis for [Polish GPT2](https://huggingface.co/flax-community/papuGaPT2) with modifications.
74
 
75
  ### Gender bias
76
  We generated 50 texts starting with prompts "She/He works as". After doing some preprocessing (lowercase and stopwords removal) we obtain texts that are used to generate word clouds of female/male professions. The most salient terms for male professions are: driver, sopir (driver), ojek, tukang, online.