File size: 5,117 Bytes
f2ed383
 
 
0a31060
53ec710
f2ed383
 
 
 
 
 
 
b407a10
 
 
 
 
 
 
 
 
 
 
 
 
f9d2b6d
b407a10
 
f9d2b6d
b407a10
 
f9d2b6d
b407a10
 
f9d2b6d
53ec710
 
 
 
 
 
 
172f994
 
 
c782dca
 
 
 
172f994
 
 
 
 
 
c782dca
 
172f994
 
 
 
 
c782dca
 
 
172f994
f2ed383
 
 
 
 
 
 
025bedc
f2ed383
44d67e8
f9d2b6d
 
 
 
f2ed383
 
 
b9e9a81
 
 
 
 
 
 
 
4c0d7d2
f2ed383
 
 
b9e9a81
f2ed383
 
 
b9e9a81
 
 
 
 
 
025bedc
b9e9a81
 
 
f2ed383
 
 
 
 
 
7ab4387
2c4c435
f2ed383
 
 
 
 
 
 
 
1dc3dc9
 
f9d2b6d
 
90d9ba6
f2ed383
 
 
 
 
 
 
53ec710
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: mit
base_model: xlm-roberta-base
tags:
- silvanus
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: xlm-roberta-base-ner-silvanus
  results:
  - task:
      name: Token Classification
      type: token-classification
    dataset:
      name: id_nergrit_corpus
      type: id_nergrit_corpus
      config: ner
      split: validation
      args: ner
    metrics:
    - name: Precision
      type: precision
      value: 0.918918918918919
    - name: Recall
      type: recall
      value: 0.9272727272727272
    - name: F1
      type: f1
      value: 0.9230769230769231
    - name: Accuracy
      type: accuracy
      value: 0.9858518778229216
language:
- id
- en
- es
- it
- sk
pipeline_tag: token-classification
widget:
- text: >-
    Kebakaran hutan dan lahan terus terjadi dan semakin meluas di Kota
    Palangkaraya, Kalimantan Tengah (Kalteng) pada hari Rabu, 15 Nopember 2023
    20.00 WIB. Bahkan kobaran api mulai membakar pondok warga dan mendekati
    permukiman. BZK #RCTINews #SeputariNews #News #Karhutla #KebakaranHutan
    #HutanKalimantan #SILVANUS_Italian_Pilot_Testing
  example_title: Indonesia
- text: >-
    Wildfire rages for a second day in Evia destroying a Natura 2000 protected
    pine forest. - 5:51 PM Aug 14, 2019
  example_title: English
- text: >-
    3 nov 2023 21:57 - Incendio forestal obliga a la evacuación de hasta 850
    personas cerca del pueblo de Montichelvo en Valencia.
  example_title: Spanish
- text: >-
    Incendi boschivi nell'est del Paese: 2 morti e oltre 50 case distrutte nello
    stato del Queensland.
  example_title: Italian
- text: >-
    Lesné požiare na Sicílii si vyžiadali dva ľudské životy a evakuáciu hotela
    http://dlvr.it/SwW3sC - 23. septembra 2023 20:57
  example_title: Slovak
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# xlm-roberta-base-ner-silvanus

This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the Indonesian NER dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0567
- Precision: 0.9189
- Recall: 0.9273
- F1: 0.9231
- Accuracy: 0.9859

## Model description

The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.

- **Developed by:** See [associated paper](https://arxiv.org/abs/1911.02116)
- **Model type:** Multi-lingual model
- **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr) for full list; model is fine-tuned on a dataset in English
- **License:** More information needed
- **Related Models:** [RoBERTa](https://huggingface.co/roberta-base), [XLM](https://huggingface.co/docs/transformers/model_doc/xlm)
    - **Parent Model:** [XLM-RoBERTa](https://huggingface.co/xlm-roberta-base)
- **Resources for more information:** [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr)

## Intended uses & limitations

This model can be used to extract multilingual information such as location, date and time on social media (Twitter, etc.). This model is limited by an Indonesian language training data set to be tested in 4 languages (English, Spanish, Italian and Slovak) using zero-shot transfer learning techniques to extract multilingual information.

## Training and evaluation data

This model was fine-tuned on Indonesian NER datasets.
Abbreviation|Description
-|-
O|Outside of a named entity
B-LOC |Beginning of a location right after another location
I-LOC |Location
B-DAT |Beginning of a date right after another date
I-DAT |Date
B-TIM |Beginning of a time right after another time
I-TIM |Time

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| 0.1394        | 1.0   | 827  | 0.0559          | 0.8808    | 0.9257 | 0.9027 | 0.9842   |
| 0.0468        | 2.0   | 1654 | 0.0575          | 0.9107    | 0.9190 | 0.9148 | 0.9849   |
| 0.0279        | 3.0   | 2481 | 0.0567          | 0.9189    | 0.9273 | 0.9231 | 0.9859   |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1