Serega6678 commited on
Commit
e542684
1 Parent(s): 24ea364

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
 
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: mit
5
+ tags:
6
+ - token-classification
7
+ - entity-recognition
8
+ - foundation-model
9
+ - feature-extraction
10
+ - BERT
11
+ - generic
12
+ datasets:
13
+ - numind/NuNER
14
+ pipeline_tag: token-classification
15
+ inference: false
16
  ---
17
+
18
+ # SOTA Entity Recognition English Foundation Model by NuMind 🔥
19
+
20
+ This model provides the embedding for the Entity Recognition task in English.
21
+
22
+ **We recommend firstly trying [NuNER RoBERTa](https://huggingface.co/numind/NuNER-v0.1) as it usually shows better results**
23
+
24
+ **Checkout other models by NuMind:**
25
+ * SOTA Multilingual Entity Recognition Foundation Model: [link](https://huggingface.co/numind/entity-recognition-multilingual-general-sota-v1)
26
+ * SOTA Sentiment Analysis Foundation Model: [English](https://huggingface.co/numind/generic-sentiment-v1), [Multilingual](https://huggingface.co/numind/generic-sentiment-multi-v1)
27
+
28
+ ## About
29
+
30
+ [bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) fine-tuned on [NuNER data](https://huggingface.co/datasets/numind/NuNER).
31
+
32
+ **Metrics:**
33
+
34
+ Read more about evaluation protocol & datasets in our [paper](https://arxiv.org/abs/2402.15343) and [blog post](https://www.numind.ai/blog/a-foundation-model-for-entity-recognition).
35
+
36
+ ## Usage
37
+
38
+ Embeddings can be used out of the box or fine-tuned on specific datasets.
39
+
40
+ Get embeddings:
41
+
42
+
43
+ ```python
44
+ import torch
45
+ import transformers
46
+
47
+
48
+ model = transformers.AutoModel.from_pretrained(
49
+ 'numind/NuNER-BERT-v1.0',
50
+ output_hidden_states=True
51
+ )
52
+ tokenizer = transformers.AutoTokenizer.from_pretrained(
53
+ 'numind/NuNER-BERT-v1.0'
54
+ )
55
+
56
+ text = [
57
+ "NuMind is an AI company based in Paris and USA.",
58
+ "See other models from us on https://huggingface.co/numind"
59
+ ]
60
+ encoded_input = tokenizer(
61
+ text,
62
+ return_tensors='pt',
63
+ padding=True,
64
+ truncation=True
65
+ )
66
+ output = model(**encoded_input)
67
+
68
+ # for better quality
69
+ emb = torch.cat(
70
+ (output.hidden_states[-1], output.hidden_states[-7]),
71
+ dim=2
72
+ )
73
+
74
+ # for better speed
75
+ # emb = output.hidden_states[-1]
76
+ ```