xiaowu0162 commited on
Commit
f6d4e26
1 Parent(s): 0113c8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -12,8 +12,7 @@ tags:
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model specialized for phrases: It maps phrases to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
- This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and further fine-tuned on large-scale keyphrase data with SimCSE.
16
-
17
 
18
  ## Citing & Authors
19
  Paper: [KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems](https://arxiv.org/abs/2303.15422)
@@ -40,10 +39,10 @@ Then you can use the model like this:
40
 
41
  ```python
42
  from sentence_transformers import SentenceTransformer
43
- sentences = ["This is an example sentence", "Each sentence is converted"]
44
 
45
  model = SentenceTransformer('{MODEL_NAME}')
46
- embeddings = model.encode(sentences)
47
  print(embeddings)
48
  ```
49
 
@@ -63,14 +62,14 @@ def mean_pooling(model_output, attention_mask):
63
 
64
 
65
  # Sentences we want sentence embeddings for
66
- sentences = ['This is an example sentence', 'Each sentence is converted']
67
 
68
  # Load model from HuggingFace Hub
69
  tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
70
  model = AutoModel.from_pretrained('{MODEL_NAME}')
71
 
72
  # Tokenize sentences
73
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
74
 
75
  # Compute token embeddings
76
  with torch.no_grad():
@@ -86,6 +85,17 @@ print(sentence_embeddings)
86
  ## Training
87
  The model was trained with the parameters:
88
 
 
 
 
 
 
 
 
 
 
 
 
89
  **DataLoader**:
90
 
91
  `torch.utils.data.dataloader.DataLoader` of length 2025 with parameters:
@@ -118,7 +128,6 @@ Parameters of the fit()-Method:
118
  }
119
  ```
120
 
121
-
122
  ## Full Model Architecture
123
  ```
124
  SentenceTransformer(
 
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model specialized for phrases: It maps phrases to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
+ This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and further fine-tuned on 1 million keyphrase data with SimCSE.
 
16
 
17
  ## Citing & Authors
18
  Paper: [KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems](https://arxiv.org/abs/2303.15422)
 
39
 
40
  ```python
41
  from sentence_transformers import SentenceTransformer
42
+ phrases = ["information retrieval", "text mining", "natural language processing"]
43
 
44
  model = SentenceTransformer('{MODEL_NAME}')
45
+ embeddings = model.encode(phrases)
46
  print(embeddings)
47
  ```
48
 
 
62
 
63
 
64
  # Sentences we want sentence embeddings for
65
+ phrases = ["information retrieval", "text mining", "natural language processing"]
66
 
67
  # Load model from HuggingFace Hub
68
  tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
69
  model = AutoModel.from_pretrained('{MODEL_NAME}')
70
 
71
  # Tokenize sentences
72
+ encoded_input = tokenizer(phrases, padding=True, truncation=True, return_tensors='pt')
73
 
74
  # Compute token embeddings
75
  with torch.no_grad():
 
85
  ## Training
86
  The model was trained with the parameters:
87
 
88
+ **Datasets**:
89
+ | Dataset Name | Number of Phrases |
90
+ |-------------------------------------------------------------|-------------------|
91
+ | [KP20k](https://www.aclweb.org/anthology/P17-1054/) | 715369 |
92
+ | [KPTimes](https://www.aclweb.org/anthology/W19-8617/) | 113456 |
93
+ | [StackEx](https://www.aclweb.org/anthology/2020.acl-main.710/) | 8149 |
94
+ | [OpenKP](https://www.aclweb.org/anthology/D19-1521/) | 200335 |
95
+ | **Total** | **1030309** |
96
+
97
+ The model was trained with the parameters:
98
+
99
  **DataLoader**:
100
 
101
  `torch.utils.data.dataloader.DataLoader` of length 2025 with parameters:
 
128
  }
129
  ```
130
 
 
131
  ## Full Model Architecture
132
  ```
133
  SentenceTransformer(