File size: 796 Bytes
8b32478
 
 
 
 
 
 
 
 
 
 
 
490be2f
 
 
 
41dd240
490be2f
 
 
 
25dc603
490be2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
language: hu
license: apache-2.0
datasets:
- common_crawl
- wikipedia
tags:
- byte representation
- gradient boosting
- hungarian
---

# Charmen-Electra

A byte-based transformer model trained on Hungarian language. In order to use the model you will need a custom Tokenizer which is available at: [https://github.com/szegedai/byte-offset-tokenizer](https://github.com/szegedai/byte-offset-tokenizer).

Since we use a custom architecture with Gradient Boosting, Down- and Up-Sampling, you have to enable Trusted Remote Code like:

```python
model = AutoModel.from_pretrained("SzegedAI/charmen-electra", trust_remote_code=True)
```
# Acknowledgement
[![Artificial Intelligence - National Laboratory - Hungary](https://milab.tk.hu/uploads/images/milab_logo_en.png)](https://mi.nemzetilabor.hu/)