giotvr commited on
Commit
c45a414
1 Parent(s): 99dadba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -13,7 +13,7 @@ metrics:
13
 
14
  This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
15
  the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
16
- and XLM-RoBERTa-base model can be found here and the original reference papers are:
17
  Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
18
  Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
19
 
@@ -27,7 +27,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
27
 
28
  - **Developed by:** Giovani Tavares and Felipe Ribas Serras
29
  - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
  - **Language(s) (NLP):** Portuguese
32
  - **License:** [More Information Needed]
33
  - **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
@@ -37,7 +37,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
37
  <!-- Provide the basic links for the model. -->
38
 
39
  - **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
40
- - **Paper [optional]:** [More Information Needed]
41
  - **Demo [optional]:** [More Information Needed]
42
 
43
  ## Uses
@@ -80,18 +80,23 @@ Use the code below to get started with the model.
80
 
81
  [More Information Needed]
82
 
83
- ## Training Details
84
 
85
- ### Training Data
86
 
87
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
 
 
 
 
 
89
  [More Information Needed]
90
 
91
- ### Training Procedure
92
 
93
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
 
95
  #### Preprocessing [optional]
96
 
97
  [More Information Needed]
 
13
 
14
  This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
15
  the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
16
+ and XLM-RoBERTa-base model can be found here. The original reference papers are:
17
  Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
18
  Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
19
 
 
27
 
28
  - **Developed by:** Giovani Tavares and Felipe Ribas Serras
29
  - **Shared by [optional]:** [More Information Needed]
30
+ - **Model type:** Transformer-based text classifier
31
  - **Language(s) (NLP):** Portuguese
32
  - **License:** [More Information Needed]
33
  - **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
 
37
  <!-- Provide the basic links for the model. -->
38
 
39
  - **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
40
+ - **Paper [optional]:** This is an ongoing research. We are currently writing a paper where we describe our experiments.
41
  - **Demo [optional]:** [More Information Needed]
42
 
43
  ## Uses
 
80
 
81
  [More Information Needed]
82
 
83
+ ## Fine-Tuning Details
84
 
85
+ ### Fine-Tuning Data
86
 
87
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
 
89
+ This is a fine tuned version of [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base) using the [ASSIN (Avaliação de Similaridade Semântica e Inferência textual)](https://huggingface.co/datasets/assin)
90
+ [More Information Needed] dataset. [ASSIN](https://huggingface.co/datasets/assin) is a corpus annotated with hypothesis/premise Portuguese sentence pairs suitable for detecting textual entailment, paraphrase or neutral
91
+ relationship between the members of such pairs. Such corpus has three subsets: *ptbr* (Brazilian Portuguese), *ptpt* (Portuguese Portuguese) and *full* (the union of the latter with the former). The *full* subset has
92
+ $10k$ sentence pairs equally distributed between *ptbr* and *ptpt* subsets.
93
  [More Information Needed]
94
 
95
+ ### Fine-Tuning Procedure
96
 
97
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
98
+ The fine-tuning procedure can be summarized in three major subsequent tasks:
99
+ i
100
  #### Preprocessing [optional]
101
 
102
  [More Information Needed]