chkla commited on
Commit
1eb1f55
β€’
1 Parent(s): 5756e85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -4,7 +4,7 @@ Welcome to **RoBERTArg**!
4
 
5
  This model was trained on ~25k heterogeneous manually annotated sentences (πŸ“š Stab et al. 2018) of controversial topics to classify text into one of two labels: 🏷 **NON-ARGUMENT** (0) and **ARGUMENT** (1).
6
 
7
- **Dataset**
8
 
9
  The dataset (πŸ“š Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include an obvious polarity to the possible outcomes and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
10
 
@@ -19,7 +19,7 @@ The dataset (πŸ“š Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that eithe
19
  | gun control | 325 | 1,889 |
20
  | minimum wage | 325 | 1,346 |
21
 
22
- **Model training**
23
 
24
  **RoBERTArg** was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters. The hyperparameters were determined using a hyperparameter search on a 20% validation set.
25
 
@@ -33,7 +33,7 @@ training_args = TrainingArguments(
33
  )
34
  ```
35
 
36
- **Evaluation**
37
 
38
  The model was evaluated using 20% of the sentences (80-20 train-test split).
39
 
@@ -48,7 +48,7 @@ Showing the **confusion matrix** using the 20% of the sentences as an evaluation
48
  | ARGUMENT | 2213 | 558 |
49
  | NON-ARGUMENT | 325 | 1790 |
50
 
51
- **Intended Uses & Potential Limitations**
52
 
53
  The model can be a starting point to dive into the exciting area of argument mining. But be aware. An argument is a complex structure, topic-dependent, and often differs between different text types. Therefore, the model may perform less well on different topics and text types, which are not included in the training set.
54
 
 
4
 
5
  This model was trained on ~25k heterogeneous manually annotated sentences (πŸ“š Stab et al. 2018) of controversial topics to classify text into one of two labels: 🏷 **NON-ARGUMENT** (0) and **ARGUMENT** (1).
6
 
7
+ πŸ—ƒ **Dataset**
8
 
9
  The dataset (πŸ“š Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include an obvious polarity to the possible outcomes and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
10
 
 
19
  | gun control | 325 | 1,889 |
20
  | minimum wage | 325 | 1,346 |
21
 
22
+ πŸƒπŸΌβ€β™‚οΈ**Model training**
23
 
24
  **RoBERTArg** was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters. The hyperparameters were determined using a hyperparameter search on a 20% validation set.
25
 
 
33
  )
34
  ```
35
 
36
+ πŸ“Š **Evaluation**
37
 
38
  The model was evaluated using 20% of the sentences (80-20 train-test split).
39
 
 
48
  | ARGUMENT | 2213 | 558 |
49
  | NON-ARGUMENT | 325 | 1790 |
50
 
51
+ ⚠️ **Intended Uses & Potential Limitations**
52
 
53
  The model can be a starting point to dive into the exciting area of argument mining. But be aware. An argument is a complex structure, topic-dependent, and often differs between different text types. Therefore, the model may perform less well on different topics and text types, which are not included in the training set.
54