maximedb commited on
Commit
3b9b6cc
1 Parent(s): 2f341ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -47
README.md CHANGED
@@ -1,63 +1,64 @@
1
  ---
2
  pipeline_tag: sentence-similarity
 
3
  tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
 
 
8
  ---
9
 
10
- # clips/mfaq
11
 
12
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
 
14
- <!--- Describe your model here -->
15
-
16
- ## Usage (Sentence-Transformers)
17
-
18
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
19
 
20
  ```
21
  pip install -U sentence-transformers
22
  ```
23
 
24
- Then you can use the model like this:
 
 
25
 
 
26
  ```python
27
  from sentence_transformers import SentenceTransformer
28
- sentences = ["This is an example sentence", "Each sentence is converted"]
 
 
 
 
29
 
30
  model = SentenceTransformer('clips/mfaq')
31
- embeddings = model.encode(sentences)
32
  print(embeddings)
33
  ```
34
 
35
-
36
-
37
- ## Usage (HuggingFace Transformers)
38
- Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
39
 
40
  ```python
41
  from transformers import AutoTokenizer, AutoModel
42
  import torch
43
 
44
-
45
- #Mean Pooling - Take attention mask into account for correct averaging
46
  def mean_pooling(model_output, attention_mask):
47
  token_embeddings = model_output[0] #First element of model_output contains all token embeddings
48
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
49
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
50
 
 
 
 
 
51
 
52
- # Sentences we want sentence embeddings for
53
- sentences = ['This is an example sentence', 'Each sentence is converted']
54
-
55
- # Load model from HuggingFace Hub
56
  tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
57
  model = AutoModel.from_pretrained('clips/mfaq')
58
 
59
  # Tokenize sentences
60
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
61
 
62
  # Compute token embeddings
63
  with torch.no_grad():
@@ -65,29 +66,4 @@ with torch.no_grad():
65
 
66
  # Perform pooling. In this case, max pooling.
67
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
68
-
69
- print("Sentence embeddings:")
70
- print(sentence_embeddings)
71
- ```
72
-
73
-
74
-
75
- ## Evaluation Results
76
-
77
- <!--- Describe how your model was evaluated -->
78
-
79
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=clips/mfaq)
80
-
81
-
82
-
83
- ## Full Model Architecture
84
- ```
85
- SentenceTransformer(
86
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
87
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
88
- )
89
- ```
90
-
91
- ## Citing & Authors
92
-
93
- <!--- Describe where people can find more information -->
 
1
  ---
2
  pipeline_tag: sentence-similarity
3
+ license: apache-2.0
4
  tags:
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
+ datasets:
10
+ - clips/mfaq
11
  ---
12
 
13
+ # MFAQ
14
 
15
+ This is a FAQ retrieval model, it ranks potential answers according to a given question. It was trained using the [MFAQ dataset](https://huggingface.co/datasets/clips/mfaq).
16
 
17
+ ## Installation
 
 
 
 
18
 
19
  ```
20
  pip install -U sentence-transformers
21
  ```
22
 
23
+ ## Usage
24
+ You can use MFAQ with sentence-transformers or directly with a HuggingFace model.
25
+ In both cases, questions need to be prepended with `<Q>`, and answers with `<A>`.
26
 
27
+ #### Sentence Transformers
28
  ```python
29
  from sentence_transformers import SentenceTransformer
30
+
31
+ question = "<Q>How many models can I host on HuggingFace?"
32
+ answer_1 = "<A>All plans come with unlimited private models and datasets."
33
+ answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
34
+ answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
35
 
36
  model = SentenceTransformer('clips/mfaq')
37
+ embeddings = model.encode([question, answer_1, answer_3, answer_3])
38
  print(embeddings)
39
  ```
40
 
41
+ #### HuggingFace Transfoormers
 
 
 
42
 
43
  ```python
44
  from transformers import AutoTokenizer, AutoModel
45
  import torch
46
 
 
 
47
  def mean_pooling(model_output, attention_mask):
48
  token_embeddings = model_output[0] #First element of model_output contains all token embeddings
49
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
50
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
51
 
52
+ question = "<Q>How many models can I host on HuggingFace?"
53
+ answer_1 = "<A>All plans come with unlimited private models and datasets."
54
+ answer_2 = "<A>AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem."
55
+ answer_3 = "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link - as low as $10 per job."
56
 
 
 
 
 
57
  tokenizer = AutoTokenizer.from_pretrained('clips/mfaq')
58
  model = AutoModel.from_pretrained('clips/mfaq')
59
 
60
  # Tokenize sentences
61
+ encoded_input = tokenizer([question, answer_1, answer_3, answer_3], padding=True, truncation=True, return_tensors='pt')
62
 
63
  # Compute token embeddings
64
  with torch.no_grad():
 
66
 
67
  # Perform pooling. In this case, max pooling.
68
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
69
+ ```