egonrp/gpt2-small-portuguese

This model is a copy of Pierre Guillou model "pierreguillou/gpt2-small-portuguese" with the addition of "tokenizer.json" file.

Link: https://huggingface.co/pierreguillou/gpt2-small-portuguese

The current model repository fixes the error below in original tokenizer model loading in "Tokenizer.from_pretrained()":

>>> from tokenizers import Tokenizer
>>> tokenizer = Tokenizer.from_pretrained("pierreguillou/gpt2-small-portuguese")
[2023-03-11T18:59:51Z ERROR cached_path::cache] ETAG fetch for https://huggingface.co/pierreguillou/gpt2-small-portuguese/resolve/main/tokenizer.json failed with fatal error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception: Model "pierreguillou/gpt2-small-portuguese" on the Hub doesn't have a tokenizer

If you want to generate the "tokenizer.json" file on your own...

Required commands to generate the absent "tokenizer.json" file from original model:

# Downloading the HuggingFace model:
$ python.exe
>>> import huggingface_hub
>>> huggingface_hub.snapshot_download(repo_id="pierreguillou/gpt2-small-portuguese", local_dir=r"C:\dev\pierre_gpt2_small_pt")

# Generating "tokenizer.json" file:
cd C:\dev\pierre_gpt2_small_pt
$ python.exe
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained(r"C:\dev\pierre_gpt2_small_pt")
>>> tokenizer._tokenizer.save("tokenizer.json")

Author (model "pierreguillou/gpt2-small-portuguese")

Portuguese GPT-2 small was trained and evaluated by Pierre GUILLOU thanks to the computing power of the GPU (GPU NVIDIA V100 32 Go) of the AI Lab (University of Brasilia) to which I am attached as an Associate Researcher in NLP and the participation of its directors in the definition of NLP strategy, Professors Fabricio Ataides Braz and Nilton Correia da Silva.

Citation

If you use our work, please cite:

@inproceedings{pierre2020gpt2smallportuguese,
  title={GPorTuguese-2 (Portuguese GPT-2 small): a Language Model for Portuguese text generation (and more NLP tasks...)},
  author={Pierre Guillou},
  year={2020}
}

egonrp
/

gpt2-small-portuguese

This model is a copy of Pierre Guillou model "pierreguillou/gpt2-small-portuguese" with the addition of "tokenizer.json" file.

If you want to generate the "tokenizer.json" file on your own...

Author (model "pierreguillou/gpt2-small-portuguese")

Citation

Dataset used to train egonrp/gpt2-small-portuguese

Space using egonrp/gpt2-small-portuguese 1