Revankumar commited on
Commit
fa3cef4
1 Parent(s): f8d1ea5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -1,3 +1,81 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ ---
6
+ tags:
7
+ - sentence-transformers
8
+ - feature-extraction
9
+ ---
10
+ # Name of Model
11
+
12
+ <!--- Describe your model here -->
13
+
14
+ ## Model Description
15
+ The model consists of the following layers:
16
+
17
+ (0) Base Transformer Type: BAAI/bge-small-en-v1.5
18
+
19
+ (1) mean Pooling
20
+
21
+
22
+ ## Usage (Sentence-Transformers)
23
+
24
+ Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
25
+
26
+ ```
27
+ pip install -U sentence-transformers
28
+ ```
29
+
30
+ Then you can use the model like this:
31
+
32
+ ```python
33
+ from sentence_transformers import SentenceTransformer
34
+ sentences = ["This is an example sentence"]
35
+ model = SentenceTransformer('model_name')
36
+ embeddings = model.encode(sentences)
37
+ print(embeddings)
38
+ ```
39
+
40
+
41
+ ## Usage (HuggingFace Transformers)
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModel
45
+ import torch
46
+ #Mean Pooling - Take attention mask into account for correct averaging
47
+ def mean_pooling(model_output, attention_mask):
48
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
49
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
50
+ sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
51
+ sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
52
+ return sum_embeddings / sum_mask
53
+ # Sentences we want sentence embeddings for
54
+ sentences = ['This is an example sentence']
55
+ # Load model from HuggingFace Hub
56
+ tokenizer = AutoTokenizer.from_pretrained('model_name')
57
+ model = AutoModel.from_pretrained('model_name')
58
+ # Tokenize sentences
59
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
60
+ # Compute token embeddings
61
+ with torch.no_grad():
62
+ model_output = model(**encoded_input)
63
+ # Perform pooling. In this case, max pooling.
64
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
65
+ print("Sentence embeddings:")
66
+ print(sentence_embeddings)
67
+ ```
68
+
69
+
70
+
71
+ ## Training Procedure
72
+
73
+ <!--- Describe how your model was trained -->
74
+
75
+ ## Evaluation Results
76
+
77
+ <!--- Describe how your model was evaluated -->
78
+
79
+ ## Citing & Authors
80
+
81
+ <!--- Describe where people can find more information -->