Eappelson commited on
Commit
440ec4b
1 Parent(s): cc81b9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -24
README.md CHANGED
@@ -18,7 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  # predicting_misdirection
20
 
21
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
 
 
22
  It achieves the following results on the evaluation set:
23
  - Loss: 1.0736
24
  - Accuracy: 0.6937
@@ -28,17 +30,13 @@ It achieves the following results on the evaluation set:
28
 
29
  ## Model description
30
 
31
- More information needed
 
 
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
36
-
37
- ## Training and evaluation data
38
-
39
- More information needed
40
-
41
- ## Training procedure
42
 
43
  ### Training hyperparameters
44
 
@@ -53,21 +51,6 @@ The following hyperparameters were used during training:
53
  - lr_scheduler_type: linear
54
  - num_epochs: 9
55
 
56
- ### Training results
57
-
58
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
59
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
60
- | 0.6794 | 1.0 | 28 | 0.6453 | 0.6396 | 0.6908 | 0.6396 | 0.5791 |
61
- | 0.5817 | 2.0 | 56 | 0.6867 | 0.6396 | 0.6663 | 0.6396 | 0.5924 |
62
- | 0.3639 | 3.0 | 84 | 0.7680 | 0.6216 | 0.6184 | 0.6216 | 0.6192 |
63
- | 0.2073 | 4.0 | 112 | 0.8974 | 0.6757 | 0.6732 | 0.6757 | 0.6687 |
64
- | 0.0729 | 5.0 | 140 | 1.0736 | 0.6937 | 0.6916 | 0.6937 | 0.6917 |
65
- | 0.1303 | 6.0 | 168 | 1.1722 | 0.6667 | 0.6638 | 0.6667 | 0.6639 |
66
- | 0.0675 | 7.0 | 196 | 1.4547 | 0.6577 | 0.6597 | 0.6577 | 0.6396 |
67
- | 0.0682 | 8.0 | 224 | 1.3582 | 0.6486 | 0.6517 | 0.6486 | 0.6497 |
68
- | 0.0678 | 9.0 | 252 | 1.3401 | 0.6486 | 0.6496 | 0.6486 | 0.6491 |
69
-
70
-
71
  ### Framework versions
72
 
73
  - Transformers 4.41.2
 
18
 
19
  # predicting_misdirection
20
 
21
+ This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the `misdirection.csv` dataset.
22
+ The data is cleaned by selecting relevant columns and filtering rows based on whether they are labeled as 'accepted' or 'rejected'. It then groups the data by a unique identifier, concatenates text entries within each group into paragraphs, and prepares these paragraphs as predictors (X). Target labels (y) are derived from the final submission grade, mapping 'accepted' to 'violation' and 'rejected' to 'non-violation'. Finally, the data is split into training and testing sets using stratified sampling with a 20% test size and a random state of 1 for reproducibility.
23
+
24
  It achieves the following results on the evaluation set:
25
  - Loss: 1.0736
26
  - Accuracy: 0.6937
 
30
 
31
  ## Model description
32
 
33
+ The code begins by loading a DistilBERT model and tokenizer configured for sequence classification with two possible labels. It then preprocesses the data: training and testing text sequences are tokenized using BERT, ensuring uniform length with padding and truncation to 256 tokens.
34
+ A CustomDataset class is defined to organize the tokenized data into a format suitable for PyTorch training, converting labels ('non-violation' and 'violation') into numeric values. Evaluation metrics such as accuracy, precision, recall, and F1 score are set up to assess model performance.
35
+ The main task is hyperparameter optimization using Optuna. An objective function is defined to optimize dropout rate, learning rate, batch size, epochs, and weight decay. For each trial, the data is tokenized again, a new model is initialized with the chosen dropout rate, and a Trainer object manages training and evaluation using these parameters. The goal is to maximize the F1 score across 15 trials.
36
 
37
  ## Intended uses & limitations
38
 
39
+ Created solely for the Humane Intelligence Algorithmic Bias Bounty
 
 
 
 
 
 
40
 
41
  ### Training hyperparameters
42
 
 
51
  - lr_scheduler_type: linear
52
  - num_epochs: 9
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ### Framework versions
55
 
56
  - Transformers 4.41.2