Serega6678 commited on
Commit
e09ea3f
1 Parent(s): 3edb90e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -14,3 +14,68 @@ datasets:
14
  pipeline_tag: token-classification
15
  inference: false
16
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  pipeline_tag: token-classification
15
  inference: false
16
  ---
17
+
18
+ # SOTA Entity Recognition English Foundation Model by NuMind 🔥
19
+
20
+ This model provides the best embedding for the Entity Recognition task in English.
21
+
22
+ **Checkout other models by NuMind:**
23
+ * SOTA Multilingual Entity Recognition Foundation Model: [link](https://huggingface.co/numind/entity-recognition-multilingual-general-sota-v1)
24
+ * SOTA Sentiment Analysis Foundation Model: [English](https://huggingface.co/numind/generic-sentiment-v1), [Multilingual](https://huggingface.co/numind/generic-sentiment-multi-v1)
25
+
26
+ ## About
27
+
28
+ [Roberta-base](https://huggingface.co/roberta-base) fine-tuned on [NuNER data](https://huggingface.co/datasets/numind/NuNER).
29
+
30
+ **Metrics:**
31
+
32
+ Read more about evaluation protocol & datasets in our [paper](https://arxiv.org/abs/2402.15343) and [blog post](https://www.numind.ai/blog/a-foundation-model-for-entity-recognition).
33
+
34
+ | Model | F1 macro |
35
+ |----------|----------|
36
+ | RoBERTa-base | 0.7129 |
37
+ | ours | 0.7500 |
38
+ | ours + two emb | 0.7686 |
39
+
40
+
41
+ ## Usage
42
+
43
+ Embeddings can be used out of the box or fine-tuned on specific datasets.
44
+
45
+ Get embeddings:
46
+
47
+
48
+ ```python
49
+ import torch
50
+ import transformers
51
+
52
+
53
+ model = transformers.AutoModel.from_pretrained(
54
+ 'numind/NuNER-v1.0',
55
+ output_hidden_states=True
56
+ )
57
+ tokenizer = transformers.AutoTokenizer.from_pretrained(
58
+ 'numind/NuNER-v1.0'
59
+ )
60
+
61
+ text = [
62
+ "NuMind is an AI company based in Paris and USA.",
63
+ "See other models from us on https://huggingface.co/numind"
64
+ ]
65
+ encoded_input = tokenizer(
66
+ text,
67
+ return_tensors='pt',
68
+ padding=True,
69
+ truncation=True
70
+ )
71
+ output = model(**encoded_input)
72
+
73
+ # for better quality
74
+ emb = torch.cat(
75
+ (output.hidden_states[-1], output.hidden_states[-7]),
76
+ dim=2
77
+ )
78
+
79
+ # for better speed
80
+ # emb = output.hidden_states[-1]
81
+ ```