indonesian-nlp
/

gpt2-medium-indonesian

Text Generation Transformers PyTorch JAX Safetensors Indonesian gpt2 Inference Endpoints text-generation-inference

Model card Files Files and versions Community

cahya commited on May 28, 2022

Commit

5e5fa4f

•

1 Parent(s): 1545f56

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -13,13 +13,13 @@ and first released at [this page](https://openai.com/blog/better-language-models
 This model was trained using HuggingFace's Flax framework and is part of the [JAX/Flax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
 organized by [HuggingFace](https://huggingface.co). All training was done on a TPUv3-8 VM sponsored by the Google Cloud team.
-The demo can be found [here](https://huggingface.co/spaces/flax-community/gpt2-indonesian).
 ## How to use
 You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
 ```python
 >>> from transformers import pipeline, set_seed
->>> generator = pipeline('text-generation', model='flax-community/gpt2-medium-indonesian')
 >>> set_seed(42)
 >>> generator("Sewindu sudah kita tak berjumpa,", max_length=30, num_return_sequences=5)
@@ -35,8 +35,8 @@ Tuhan akan memberi lebih dari apa yang kita'}]
 Here is how to use this model to get the features of a given text in PyTorch:
 ```python
 from transformers import GPT2Tokenizer, GPT2Model
-tokenizer = GPT2Tokenizer.from_pretrained('flax-community/gpt2-medium-indonesian')
-model = GPT2Model.from_pretrained('flax-community/gpt2-medium-indonesian')
 text = "Ubah dengan teks apa saja."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
@@ -45,8 +45,8 @@ output = model(**encoded_input)
 and in TensorFlow:
 ```python
 from transformers import GPT2Tokenizer, TFGPT2Model
-tokenizer = GPT2Tokenizer.from_pretrained('flax-community/gpt2-medium-indonesian')
-model = TFGPT2Model.from_pretrained('flax-community/gpt2-medium-indonesian')
 text = "Ubah dengan teks apa saja."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
@@ -70,7 +70,7 @@ As the openAI team themselves point out in their [model card](https://github.com
 > race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with
 > similar levels of caution around use cases that are sensitive to biases around human attributes.
-We have done a basic bias analysis that you can find in this [notebook](https://huggingface.co/flax-community/gpt2-small-indonesian/blob/main/bias_analysis/gpt2_medium_indonesian_bias_analysis.ipynb), performed on [Indonesian GPT2 medium](https://huggingface.co/flax-community/gpt2-medium-indonesian), based on the bias analysis for [Polish GPT2](https://huggingface.co/flax-community/papuGaPT2) with modifications.
 ### Gender bias
 We generated 50 texts starting with prompts "She/He works as". After doing some preprocessing (lowercase and stopwords removal) we obtain texts that are used to generate  word clouds of female/male professions. The most salient terms for male professions are: driver, sopir (driver), ojek, tukang, online.

 This model was trained using HuggingFace's Flax framework and is part of the [JAX/Flax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
 organized by [HuggingFace](https://huggingface.co). All training was done on a TPUv3-8 VM sponsored by the Google Cloud team.
+The demo can be found [here](https://huggingface.co/spaces/indonesian-nlp/gpt2-app).
 ## How to use
 You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
 ```python
 >>> from transformers import pipeline, set_seed
+>>> generator = pipeline('text-generation', model='indonesian-nlp/gpt2-medium-indonesian')
 >>> set_seed(42)
 >>> generator("Sewindu sudah kita tak berjumpa,", max_length=30, num_return_sequences=5)
 Here is how to use this model to get the features of a given text in PyTorch:
 ```python
 from transformers import GPT2Tokenizer, GPT2Model
+tokenizer = GPT2Tokenizer.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
+model = GPT2Model.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
 text = "Ubah dengan teks apa saja."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
 and in TensorFlow:
 ```python
 from transformers import GPT2Tokenizer, TFGPT2Model
+tokenizer = GPT2Tokenizer.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
+model = TFGPT2Model.from_pretrained('indonesian-nlp/gpt2-medium-indonesian')
 text = "Ubah dengan teks apa saja."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 > race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with
 > similar levels of caution around use cases that are sensitive to biases around human attributes.
+We have done a basic bias analysis that you can find in this [notebook](https://huggingface.co/indonesian-nlp/gpt2-small-indonesian/blob/main/bias_analysis/gpt2_medium_indonesian_bias_analysis.ipynb), performed on [Indonesian GPT2 medium](https://huggingface.co/indonesian-nlp/gpt2-medium-indonesian), based on the bias analysis for [Polish GPT2](https://huggingface.co/flax-community/papuGaPT2) with modifications.
 ### Gender bias
 We generated 50 texts starting with prompts "She/He works as". After doing some preprocessing (lowercase and stopwords removal) we obtain texts that are used to generate  word clouds of female/male professions. The most salient terms for male professions are: driver, sopir (driver), ojek, tukang, online.