sarahwei
/

MITRE-tactic-bert-case-based

Text Classification

Inference Endpoints

Model card Files Files and versions Community

sarahwei commited on Jun 25

Commit

de68312

•

1 Parent(s): f1ec54d

Update README.md

Files changed (1) hide show

README.md +71 -1

README.md CHANGED Viewed

@@ -4,4 +4,74 @@ language:
 - en
 base_model: bencyc1129/mitre-bert-base-cased
 pipeline_tag: text-classification
----

 - en
 base_model: bencyc1129/mitre-bert-base-cased
 pipeline_tag: text-classification
+---
+## MITRE-tactic-bert-case-based
+It's a fine-tuned model from [mitre-bert-base-cased](https://huggingface.co/bencyc1129/mitre-bert-base-cased) on the [MITRE](https://attack.mitre.org/) procedure dataset. It achieves
+- loss:0.057
+- accuracy:0.87
+on evaluation dataset.
+## Intended uses & limitations
+You can use the fine-tuned model for text classification. It aims to identify the tactic that the sentence belongs to in MITRE ATT&CK framework.
+A sentence or an attack may fall into several tactics.
+Note that this model is primarily fine-tuned on text classification for cybersecurity.
+It may not perform well if the sentence is not related to attacks.
+## How to use
+You can use the model with Tensorflow.
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "sarahwei/MITRE-tactic-bert-case-based"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    # device_map="auto",
+)
+question = 'An attacker performs a SQL injection.'
+input_ids = tokenizer(question,return_tensors="pt")
+outputs = model(**input_ids)
+logits = outputs.logits
+sigmoid = torch.nn.Sigmoid()
+probs = sigmoid(logits.squeeze().cpu())
+predictions = np.zeros(probs.shape)
+predictions[np.where(probs >= 0.5)] = 1
+predicted_labels = [model.config.id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
+```
+## Training procedure
+### Training parameter
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 0
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 10
+- warmup_ratio: 0.01
+- weight_decay: 0.001
+### Training results
+|Step| Training Loss| Validation Loss| F1 | Roc AUC | accuracy |
+|:--------:| :------------:|:----------:|:------------:|:-----------:|:---------------:|
+|   100| 0.409400	|0.142982|0.740000|0.803830|0.610000|
+|  200|0.106500|0.093503|0.818182	|0.868382	|0.720000|
+|  300|0.070200|	0.065937|	0.893617|	0.930366|	0.810000|
+|  400|0.045500|	0.061865|	0.892704|	0.926625|	0.830000|
+|  500|0.033600|	0.057814|	0.902954|	0.938630|	0.860000|
+|  600|0.026000|	0.062982|	0.894515|	0.934107|	0.840000|
+|  700|0.021900|	0.056275|	0.904564|	0.946113|	0.870000|
+|  800|0.017700|	0.061058|	0.887967|	0.937067|	0.860000|
+|  900|0.016100|	0.058965|	0.890756|	0.933716|	0.870000|
+|  1000|0.014200|	0.055885|	0.903766|	0.942372|	0.880000|
+|  1100|0.013200|	0.056888|	0.895397|	0.937849|	0.880000|
+|  1200|0.012700|	0.057484|	0.895397|	0.937849|	0.870000|