sarahwei commited on
Commit
de68312
1 Parent(s): f1ec54d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -4,4 +4,74 @@ language:
4
  - en
5
  base_model: bencyc1129/mitre-bert-base-cased
6
  pipeline_tag: text-classification
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  base_model: bencyc1129/mitre-bert-base-cased
6
  pipeline_tag: text-classification
7
+ ---
8
+
9
+ ## MITRE-tactic-bert-case-based
10
+
11
+ It's a fine-tuned model from [mitre-bert-base-cased](https://huggingface.co/bencyc1129/mitre-bert-base-cased) on the [MITRE](https://attack.mitre.org/) procedure dataset. It achieves
12
+ - loss:0.057
13
+ - accuracy:0.87
14
+
15
+ on evaluation dataset.
16
+
17
+
18
+ ## Intended uses & limitations
19
+ You can use the fine-tuned model for text classification. It aims to identify the tactic that the sentence belongs to in MITRE ATT&CK framework.
20
+ A sentence or an attack may fall into several tactics.
21
+
22
+ Note that this model is primarily fine-tuned on text classification for cybersecurity.
23
+ It may not perform well if the sentence is not related to attacks.
24
+
25
+ ## How to use
26
+ You can use the model with Tensorflow.
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
29
+ import torch
30
+ model_id = "sarahwei/MITRE-tactic-bert-case-based"
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
33
+ model = AutoModelForSequenceClassification.from_pretrained(
34
+ model_id,
35
+ torch_dtype=torch.bfloat16,
36
+ # device_map="auto",
37
+ )
38
+ question = 'An attacker performs a SQL injection.'
39
+ input_ids = tokenizer(question,return_tensors="pt")
40
+ outputs = model(**input_ids)
41
+ logits = outputs.logits
42
+ sigmoid = torch.nn.Sigmoid()
43
+ probs = sigmoid(logits.squeeze().cpu())
44
+ predictions = np.zeros(probs.shape)
45
+ predictions[np.where(probs >= 0.5)] = 1
46
+ predicted_labels = [model.config.id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
47
+ ```
48
+
49
+ ## Training procedure
50
+ ### Training parameter
51
+ - learning_rate: 5e-05
52
+ - train_batch_size: 8
53
+ - eval_batch_size: 8
54
+ - seed: 0
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: linear
57
+ - num_epochs: 10
58
+ - warmup_ratio: 0.01
59
+ - weight_decay: 0.001
60
+
61
+ ### Training results
62
+
63
+ |Step| Training Loss| Validation Loss| F1 | Roc AUC | accuracy |
64
+ |:--------:| :------------:|:----------:|:------------:|:-----------:|:---------------:|
65
+ | 100| 0.409400 |0.142982|0.740000|0.803830|0.610000|
66
+ | 200|0.106500|0.093503|0.818182 |0.868382 |0.720000|
67
+ | 300|0.070200| 0.065937| 0.893617| 0.930366| 0.810000|
68
+ | 400|0.045500| 0.061865| 0.892704| 0.926625| 0.830000|
69
+ | 500|0.033600| 0.057814| 0.902954| 0.938630| 0.860000|
70
+ | 600|0.026000| 0.062982| 0.894515| 0.934107| 0.840000|
71
+ | 700|0.021900| 0.056275| 0.904564| 0.946113| 0.870000|
72
+ | 800|0.017700| 0.061058| 0.887967| 0.937067| 0.860000|
73
+ | 900|0.016100| 0.058965| 0.890756| 0.933716| 0.870000|
74
+ | 1000|0.014200| 0.055885| 0.903766| 0.942372| 0.880000|
75
+ | 1100|0.013200| 0.056888| 0.895397| 0.937849| 0.880000|
76
+ | 1200|0.012700| 0.057484| 0.895397| 0.937849| 0.870000|
77
+