FERNET-C5

FERNET-C5 (Flexible Embedding Representation NETwork) is a monolingual Czech BERT-base model pre-trained from 93GB of Czech Colossal Clean Crawled Corpus (C5). See our paper for details.

We released also a successor of this model based on the RoBERTa architecture fav-kky/FERNET-C5-RoBERTa.

Paper

https://link.springer.com/chapter/10.1007/978-3-030-89579-2_3

The preprint of our paper is available at https://arxiv.org/abs/2107.10042.

Citation

If you find this model useful, please cite our paper:

@inproceedings{FERNETC5,
    title        = {Comparison of Czech Transformers on Text Classification Tasks},
    author       = {Lehe{\v{c}}ka, Jan and {\v{S}}vec, Jan},
    year         = 2021,
    booktitle    = {Statistical Language and Speech Processing},
    publisher    = {Springer International Publishing},
    address      = {Cham},
    pages        = {27--37},
    doi          = {10.1007/978-3-030-89579-2_3},
    isbn         = {978-3-030-89579-2},
    editor       = {Espinosa-Anke, Luis and Mart{\'i}n-Vide, Carlos and Spasi{\'{c}}, Irena}
}
Downloads last month
905
Safetensors
Model size
164M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.