xiaowu0162
commited on
Commit
•
f6d4e26
1
Parent(s):
0113c8b
Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,7 @@ tags:
|
|
12 |
|
13 |
This is a [sentence-transformers](https://www.SBERT.net) model specialized for phrases: It maps phrases to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
|
15 |
-
This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and further fine-tuned on
|
16 |
-
|
17 |
|
18 |
## Citing & Authors
|
19 |
Paper: [KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems](https://arxiv.org/abs/2303.15422)
|
@@ -40,10 +39,10 @@ Then you can use the model like this:
|
|
40 |
|
41 |
```python
|
42 |
from sentence_transformers import SentenceTransformer
|
43 |
-
|
44 |
|
45 |
model = SentenceTransformer('{MODEL_NAME}')
|
46 |
-
embeddings = model.encode(
|
47 |
print(embeddings)
|
48 |
```
|
49 |
|
@@ -63,14 +62,14 @@ def mean_pooling(model_output, attention_mask):
|
|
63 |
|
64 |
|
65 |
# Sentences we want sentence embeddings for
|
66 |
-
|
67 |
|
68 |
# Load model from HuggingFace Hub
|
69 |
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
|
70 |
model = AutoModel.from_pretrained('{MODEL_NAME}')
|
71 |
|
72 |
# Tokenize sentences
|
73 |
-
encoded_input = tokenizer(
|
74 |
|
75 |
# Compute token embeddings
|
76 |
with torch.no_grad():
|
@@ -86,6 +85,17 @@ print(sentence_embeddings)
|
|
86 |
## Training
|
87 |
The model was trained with the parameters:
|
88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
**DataLoader**:
|
90 |
|
91 |
`torch.utils.data.dataloader.DataLoader` of length 2025 with parameters:
|
@@ -118,7 +128,6 @@ Parameters of the fit()-Method:
|
|
118 |
}
|
119 |
```
|
120 |
|
121 |
-
|
122 |
## Full Model Architecture
|
123 |
```
|
124 |
SentenceTransformer(
|
|
|
12 |
|
13 |
This is a [sentence-transformers](https://www.SBERT.net) model specialized for phrases: It maps phrases to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
|
15 |
+
This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and further fine-tuned on 1 million keyphrase data with SimCSE.
|
|
|
16 |
|
17 |
## Citing & Authors
|
18 |
Paper: [KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems](https://arxiv.org/abs/2303.15422)
|
|
|
39 |
|
40 |
```python
|
41 |
from sentence_transformers import SentenceTransformer
|
42 |
+
phrases = ["information retrieval", "text mining", "natural language processing"]
|
43 |
|
44 |
model = SentenceTransformer('{MODEL_NAME}')
|
45 |
+
embeddings = model.encode(phrases)
|
46 |
print(embeddings)
|
47 |
```
|
48 |
|
|
|
62 |
|
63 |
|
64 |
# Sentences we want sentence embeddings for
|
65 |
+
phrases = ["information retrieval", "text mining", "natural language processing"]
|
66 |
|
67 |
# Load model from HuggingFace Hub
|
68 |
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
|
69 |
model = AutoModel.from_pretrained('{MODEL_NAME}')
|
70 |
|
71 |
# Tokenize sentences
|
72 |
+
encoded_input = tokenizer(phrases, padding=True, truncation=True, return_tensors='pt')
|
73 |
|
74 |
# Compute token embeddings
|
75 |
with torch.no_grad():
|
|
|
85 |
## Training
|
86 |
The model was trained with the parameters:
|
87 |
|
88 |
+
**Datasets**:
|
89 |
+
| Dataset Name | Number of Phrases |
|
90 |
+
|-------------------------------------------------------------|-------------------|
|
91 |
+
| [KP20k](https://www.aclweb.org/anthology/P17-1054/) | 715369 |
|
92 |
+
| [KPTimes](https://www.aclweb.org/anthology/W19-8617/) | 113456 |
|
93 |
+
| [StackEx](https://www.aclweb.org/anthology/2020.acl-main.710/) | 8149 |
|
94 |
+
| [OpenKP](https://www.aclweb.org/anthology/D19-1521/) | 200335 |
|
95 |
+
| **Total** | **1030309** |
|
96 |
+
|
97 |
+
The model was trained with the parameters:
|
98 |
+
|
99 |
**DataLoader**:
|
100 |
|
101 |
`torch.utils.data.dataloader.DataLoader` of length 2025 with parameters:
|
|
|
128 |
}
|
129 |
```
|
130 |
|
|
|
131 |
## Full Model Architecture
|
132 |
```
|
133 |
SentenceTransformer(
|