File size: 1,603 Bytes
f61b68d 5b78be0 423f974 21d5aaf 7ed6f24 21d5aaf 7ed6f24 21d5aaf 7ed6f24 4b41f57 21d5aaf 7ed6f24 21d5aaf 7ed6f24 21d5aaf 7891d5d 7ed6f24 21d5aaf 7ed6f24 21d5aaf 7ed6f24 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: unknown
language:
- si
metrics:
- perplexity
library_name: transformers
tags:
- AshenBerto
- Sinhala
- Roberta
---
### 🌟 Overview
This is a slightly smaller model trained on half of the [FastText](https://fasttext.cc/docs/en/crawl-vectors.html) dataset. Since Sinhala is a low-resource language, there’s a noticeable lack of pre-trained models available for it. 😕 This gap makes it harder to represent the language properly in the world of NLP.
But hey, that’s where this model comes in! 🚀 It opens up exciting opportunities to improve tasks like sentiment analysis, machine translation, named entity recognition, or even question answering—tailored just for Sinhala. 🇱🇰✨
---
### 🛠 Model Specs
Here’s what powers this model (we went with [RoBERTa](https://arxiv.org/abs/1907.11692)):
1️⃣ **vocab_size** = 25,000
2️⃣ **max_position_embeddings** = 514
3️⃣ **num_attention_heads** = 12
4️⃣ **num_hidden_layers** = 6
5️⃣ **type_vocab_size** = 1
🎯 **Perplexity Value**: 3.5
---
### 🚀 How to Use
You can jump right in and use this model for masked language modeling! 🧩
```python
from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
# Load the model and tokenizer
model = AutoModelWithLMHead.from_pretrained("ashenR/AshenBERTo")
tokenizer = AutoTokenizer.from_pretrained("ashenR/AshenBERTo")
# Create a fill-mask pipeline
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
# Try it out with a Sinhala sentence! 🇱🇰
fill_mask("මම ගෙදර <mask>.")
``` |