Web Application
- Link: https://istoryai.ngrok.app/
- This project is the continuation of our Thesis 1 which required us to implement the machine learning model into an application. IstoryAI is a dyslexic reading assistant tool that comes with a web application and extension. It serves as a digital intervention for students with Filipino language comprehension impairment. It has features similar to Microsoft Immersive Reader such as text-to-speech, text-preferences, grammar tools with Named Entity Recognition as additional features, and reading preferences. The goal of our tool is to help improve Filipino language and vocabulary acquisition for impaired students who have dyslexia, adhd, autism, and intellectual delays.
Description
- The dataset consists of 148 Filipino storytelling books, 4,523 sentences, 7,118 tokens, and 868 unique tokens.
- This NER model only supports the Filipino language and does not include proper nouns, verbs, adjectives, and adverbs as of the moment
- The input must undergo preprocessing. Soon I will upload the code to GitHub for preprocessing the input
- To replicate the preprocessed input use this example as a guide
- Input: "May umaapoy na bahay "
- Preprocessed Input: "apoy bahay"
roberta-tagalog-base-ner-v1
This model is a fine-tuned version of jcblaise/roberta-tagalog-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1768
- Precision: 0.9579
- Recall: 0.9600
- F1: 0.9590
- Accuracy: 0.9744
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
No log | 1.0 | 205 | 0.2106 | 0.8855 | 0.8757 | 0.8806 | 0.9368 |
No log | 2.0 | 410 | 0.1410 | 0.9242 | 0.9334 | 0.9288 | 0.9632 |
0.2474 | 3.0 | 615 | 0.1511 | 0.9367 | 0.9523 | 0.9444 | 0.9674 |
0.2474 | 4.0 | 820 | 0.1600 | 0.9586 | 0.9501 | 0.9543 | 0.9700 |
0.017 | 5.0 | 1025 | 0.1586 | 0.9544 | 0.9534 | 0.9539 | 0.9726 |
0.017 | 6.0 | 1230 | 0.1701 | 0.9461 | 0.9545 | 0.9503 | 0.9708 |
0.017 | 7.0 | 1435 | 0.1703 | 0.9557 | 0.9578 | 0.9568 | 0.9732 |
0.0022 | 8.0 | 1640 | 0.1733 | 0.9568 | 0.9589 | 0.9579 | 0.9740 |
0.0022 | 9.0 | 1845 | 0.1772 | 0.9579 | 0.9600 | 0.9590 | 0.9732 |
0.0011 | 10.0 | 2050 | 0.1768 | 0.9579 | 0.9600 | 0.9590 | 0.9744 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.12.1+cu113
- Datasets 2.7.1
- Tokenizers 0.13.2
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.