AswanthCManoj commited on
Commit
26e912c
1 Parent(s): fda50f3

Model added

Browse files
README.md CHANGED
@@ -1,3 +1,89 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - precision
7
+ - recall
8
+ - accuracy
9
+ - f1
10
+ language:
11
+ - en
12
+ widget:
13
+ - text: "Broadcom agreed to acquire cloud computing company VMware in a $61 billion (€57bn) cash-and stock deal, massively diversifying the chipmaker’s business and almost tripling its software-related revenue to about 45% of its total sales. By the numbers: VMware shareholders will receive either $142.50 in cash or 0.2520 of a Broadcom share for each VMware stock. Broadcom will also assume $8 billion of VMware's net debt."
14
+ - text: "Canadian Natural Resources Minister Jonathan Wilkinson told Bloomberg that the country could start supplying Europe with liquefied natural gas (LNG) in as soon as three years by converting an existing LNG import facility on Canada’s Atlantic coast into an export terminal. Bottom line: Wilkinson said what Canada cares about is that the new LNG facility uses a low-emission process for the gas and is capable of transitioning to exporting hydrogen later on."
15
+ - text: "Google is being investigated by the UK’s antitrust watchdog for its dominance in the \"ad tech stack,\" the set of services that facilitate the sale of online advertising space between advertisers and sellers. Google has strong positions at various levels of the ad tech stack and charges fees to both publishers and advertisers. A step back: UK Competition and Markets Authority has also been investigating whether Google and Meta colluded over ads, probing into the advertising agreement between the two companies, codenamed Jedi Blue."
16
+ - text: "Shares in Twitter closed 6.35% up after an SEC 13D filing revealed that Elon Musk pledged to put up an additional $6.25 billion of his own wealth to fund the $44 billion takeover deal, lifting the total to $33.5 billion from an initial $27.25 billion. In other news: Former Twitter CEO Jack Dorsey announced he's stepping down, but would stay on Twitter’s board \\“until his term expires at the 2022 meeting of stockholders.\""
17
+ model-index:
18
+ - name: bert-uncased-keyword-extractor
19
+ results: []
20
  ---
21
+
22
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
23
+ should probably proofread and complete it, then remove this comment. -->
24
+
25
+ # bert-uncased-keyword-extractor
26
+
27
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
28
+ It achieves the following results on the evaluation set:
29
+ - Loss: 0.1247
30
+ - Precision: 0.8547
31
+ - Recall: 0.8825
32
+ - Accuracy: 0.9741
33
+ - F1: 0.8684
34
+
35
+ ## Model description
36
+
37
+ More information needed
38
+
39
+ ## Intended uses & limitations
40
+
41
+ ### Use a pipeline as a high-level helper
42
+ from transformers import pipeline
43
+
44
+ pipe = pipeline("token-classification", model="Azma-AI/bert-uncased-keyword-extractor")
45
+
46
+ ### Load model directly
47
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained("Azma-AI/bert-uncased-keyword-extractor")
50
+ model = AutoModelForTokenClassification.from_pretrained("Azma-AI/bert-uncased-keyword-extractor")
51
+
52
+ ## Training and evaluation data
53
+
54
+ More information needed
55
+
56
+ ## Training procedure
57
+
58
+ ### Training hyperparameters
59
+
60
+ The following hyperparameters were used during training:
61
+ - learning_rate: 2e-05
62
+ - train_batch_size: 16
63
+ - eval_batch_size: 16
64
+ - seed: 42
65
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
66
+ - lr_scheduler_type: linear
67
+ - num_epochs: 8
68
+ - mixed_precision_training: Native AMP
69
+
70
+ ### Training results
71
+
72
+ | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | Accuracy | F1 |
73
+ |:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|:--------:|:------:|
74
+ | 0.165 | 1.0 | 1875 | 0.1202 | 0.7109 | 0.7766 | 0.9505 | 0.7423 |
75
+ | 0.1211 | 2.0 | 3750 | 0.1011 | 0.7801 | 0.8186 | 0.9621 | 0.7989 |
76
+ | 0.0847 | 3.0 | 5625 | 0.0945 | 0.8292 | 0.8044 | 0.9667 | 0.8166 |
77
+ | 0.0614 | 4.0 | 7500 | 0.0927 | 0.8409 | 0.8524 | 0.9711 | 0.8466 |
78
+ | 0.0442 | 5.0 | 9375 | 0.1057 | 0.8330 | 0.8738 | 0.9712 | 0.8529 |
79
+ | 0.0325 | 6.0 | 11250 | 0.1103 | 0.8585 | 0.8743 | 0.9738 | 0.8663 |
80
+ | 0.0253 | 7.0 | 13125 | 0.1204 | 0.8453 | 0.8825 | 0.9735 | 0.8635 |
81
+ | 0.0203 | 8.0 | 15000 | 0.1247 | 0.8547 | 0.8825 | 0.9741 | 0.8684 |
82
+
83
+
84
+ ### Framework versions
85
+
86
+ - Transformers 4.19.2
87
+ - Pytorch 1.11.0+cu113
88
+ - Datasets 2.2.2
89
+ - Tokenizers 0.12.1
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bert-base-uncased",
3
+ "architectures": [
4
+ "BertForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "O",
14
+ "1": "B-KEY",
15
+ "2": "I-KEY"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "B-KEY": 1,
21
+ "I-KEY": 2,
22
+ "O": 0
23
+ },
24
+ "layer_norm_eps": 1e-12,
25
+ "max_position_embeddings": 512,
26
+ "model_type": "bert",
27
+ "num_attention_heads": 12,
28
+ "num_hidden_layers": 12,
29
+ "pad_token_id": 0,
30
+ "position_embedding_type": "absolute",
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.19.2",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
gitattributes.txt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.onnx filter=lfs diff=lfs merge=lfs -text
13
+ *.ot filter=lfs diff=lfs merge=lfs -text
14
+ *.parquet filter=lfs diff=lfs merge=lfs -text
15
+ *.pb filter=lfs diff=lfs merge=lfs -text
16
+ *.pt filter=lfs diff=lfs merge=lfs -text
17
+ *.pth filter=lfs diff=lfs merge=lfs -text
18
+ *.rar filter=lfs diff=lfs merge=lfs -text
19
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
21
+ *.tflite filter=lfs diff=lfs merge=lfs -text
22
+ *.tgz filter=lfs diff=lfs merge=lfs -text
23
+ *.wasm filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
gitignore.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ checkpoint-*/
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0903b6383ec9cbe81c3f4adfbe63842c6a3c38b455614acaa2f09e10258e124b
3
+ size 435646257
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased", "tokenizer_class": "BertTokenizer"}
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23e2ac56ff3dd45b56cd8fa4a4fe6d23cff38ca86689967b66ec0feba4aaf0e1
3
+ size 3247
vocab.txt ADDED
The diff for this file is too large to render. See raw diff