macedonizer
/

hr-gpt2

@@ -1,20 +1,20 @@
 ---
 language:
-- sr
-thumbnail: https://huggingface.co/macedonizer/sr-gpt2/desanka-maksimovic.jpeg
 license: Apache 2.0
 datasets:
-- wiki-sr
 ---
-# sr-gpt2
 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
 Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
 ## Model description
-sr-gpt2 is a transformers model pretrained on a very large corpus of Serbian data in a self-supervised fashion. This
 means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots
 of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
 it was trained to guess the next word in sentences.
@@ -28,13 +28,12 @@ prompt.
 ### How to use
 Here is how to use this model to get the features of a given text in PyTorch:
-import random \
-from transformers import AutoTokenizer, AutoModelWithLMHead
-tokenizer = AutoTokenizer.from_pretrained('macedonizer/sr-gpt2') \
 model = AutoModelWithLMHead.from_pretrained('macedonizer/sr-gpt2')
-input_text = 'Ја сам био '
 if len(input_text) == 0: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
@@ -48,7 +47,7 @@ if len(input_text) == 0: \
     ) \
 else: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
-    output = model.generate( \
         **encoded_input, \
         bos_token_id=random.randint(1, 50000), \
         do_sample=True, \

 ---
 language:
+- hr
+thumbnail: https://huggingface.co/macedonizer/hr-gpt2/lets-talk-about-nlp-hr.jpg
 license: Apache 2.0
 datasets:
+- wiki-hr
 ---
+# hr-gpt2
 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
 Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
 ## Model description
+hr-gpt2 is a transformers model pretrained on a very large corpus of Croation data in a self-supervised fashion. This
 means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots
 of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
 it was trained to guess the next word in sentences.
 ### How to use
 Here is how to use this model to get the features of a given text in PyTorch:
+import random \\nfrom transformers import AutoTokenizer, AutoModelWithLMHead
+tokenizer = AutoTokenizer.from_pretrained('macedonizer/hr-gpt2') \
 model = AutoModelWithLMHead.from_pretrained('macedonizer/sr-gpt2')
+input_text = 'Ja sam bio '
 if len(input_text) == 0: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
     ) \
 else: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
+        output = model.generate( \
         **encoded_input, \
         bos_token_id=random.randint(1, 50000), \
         do_sample=True, \