Upload commited on
Commit
bccc26c
1 Parent(s): 2e53f09
Files changed (3) hide show
  1. README.md +78 -26
  2. config.json +3 -3
  3. pytorch_model.bin +2 -2
README.md CHANGED
@@ -1,43 +1,95 @@
 
1
  ---
2
- tags: autonlp
3
  language: bn
4
- widget:
5
- - text: "I love AutoNLP 🤗"
6
- datasets:
7
- - albertvillanova/autonlp-data-baselines-wikiann-entity_extraction
 
 
 
 
 
 
 
8
  ---
9
 
10
- # Model Trained Using AutoNLP
11
 
12
- - Problem type: Entity Extraction
13
- - Model ID: 1341171
14
 
15
- ## Validation Metrics
16
 
17
- - Loss: 0.13715848326683044
18
- - Accuracy: 0.9730101212045483
19
- - Precision: 0.0
20
- - Recall: 0.0
21
- - F1: 0.0
22
 
23
- ## Usage
 
 
 
 
 
 
 
 
24
 
25
- You can use cURL to access this model:
26
 
27
- ```
28
- $ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoNLP"}' https://api-inference.huggingface.co/models/albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171
29
- ```
 
 
 
 
 
30
 
31
- Or Python API:
 
32
 
 
 
 
 
 
33
  ```
34
- from transformers import AutoModelForTokenClassification, AutoTokenizer
35
 
36
- model = AutoModelForTokenClassification.from_pretrained("albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171", use_auth_token=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- tokenizer = AutoTokenizer.from_pretrained("albertvillanova/autonlp-baselines-wikiann-entity_extraction-1341171", use_auth_token=True)
39
 
40
- inputs = tokenizer("I love AutoNLP", return_tensors="pt")
41
 
42
- outputs = model(**inputs)
43
- ```
 
 
 
 
 
1
+
2
  ---
 
3
  language: bn
4
+ tags:
5
+ - collaborative
6
+ - bengali
7
+ - NER
8
+ license: apache-2.0
9
+ datasets: xtreme
10
+ metrics:
11
+ - Loss
12
+ - Accuracy
13
+ - Precision
14
+ - Recall
15
  ---
16
 
17
+ # sahajBERT Named Entity Recognition
18
 
19
+ ## Model description
 
20
 
21
+ [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann).
22
 
23
+ Named Entities predicted by the model:
 
 
 
 
24
 
25
+ | Label id | Label |
26
+ |:--------:|:----:|
27
+ |0 |O|
28
+ |1 |B-PER|
29
+ |2 |I-PER|
30
+ |3 |B-ORG|
31
+ |4 |I-ORG|
32
+ |5 |B-LOC|
33
+ |6 |I-LOC|
34
 
35
+ ## Intended uses & limitations
36
 
37
+ #### How to use
38
+
39
+ You can use this model directly with a pipeline for token classification:
40
+ ```python
41
+ from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast
42
+
43
+ # Initialize tokenizer
44
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")
45
 
46
+ # Initialize model
47
+ model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")
48
 
49
+ # Initialize pipeline
50
+ pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)
51
+
52
+ raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
53
+ output = pipeline(raw_text)
54
  ```
 
55
 
56
+ #### Limitations and bias
57
+
58
+ <!-- Provide examples of latent issues and potential remediations. -->
59
+ WIP
60
+
61
+ ## Training data
62
+
63
+ The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step 2489 and trained on the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann)
64
+
65
+ ## Training procedure
66
+
67
+ Coming soon!
68
+ <!-- ```bibtex
69
+ @inproceedings{...,
70
+ year={2020}
71
+ }
72
+ ``` -->
73
+
74
+ ## Eval results
75
+
76
+ accuracy: 0.9291424418604651
77
+
78
+ f1: 0.8475143403441683
79
+
80
+ loss: 0.2975200116634369
81
+
82
+ precision: 0.8254189944134078
83
+
84
+ recall: 0.8708251473477406
85
+
86
 
 
87
 
88
+ ### BibTeX entry and citation info
89
 
90
+ Coming soon!
91
+ <!-- ```bibtex
92
+ @inproceedings{...,
93
+ year={2020}
94
+ }
95
+ ``` -->
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "AutoNLP",
3
  "_num_labels": 7,
4
  "architectures": [
5
  "AlbertForTokenClassification"
@@ -36,7 +36,7 @@
36
  "6": 6
37
  },
38
  "layer_norm_eps": 1e-12,
39
- "max_length": 128,
40
  "max_position_embeddings": 512,
41
  "model_type": "albert",
42
  "net_structure_type": 0,
@@ -47,7 +47,7 @@
47
  "pad_token_id": 0,
48
  "padding": "max_length",
49
  "position_embedding_type": "absolute",
50
- "transformers_version": "4.5.1",
51
  "type_vocab_size": 2,
52
  "vocab_size": 32000
53
  }
 
1
  {
2
+ "_name_or_path": "albertvillanova/autonlp-wikiann-entity_extraction-0c6d343-101875",
3
  "_num_labels": 7,
4
  "architectures": [
5
  "AlbertForTokenClassification"
 
36
  "6": 6
37
  },
38
  "layer_norm_eps": 1e-12,
39
+ "max_length": 96,
40
  "max_position_embeddings": 512,
41
  "model_type": "albert",
42
  "net_structure_type": 0,
 
47
  "pad_token_id": 0,
48
  "padding": "max_length",
49
  "position_embedding_type": "absolute",
50
+ "transformers_version": "4.6.1",
51
  "type_vocab_size": 2,
52
  "vocab_size": 32000
53
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63e33ea5155516d7f13890f2acf28b5e3f9d23141d23d515ab62a28ed94635e1
3
- size 67605529
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42080d3ed92e65c13c467829d36c6f803f5a64587a55089aa8f7e94dffaf62cb
3
+ size 67605209