binhcode25 commited on
Commit
a3c665b
1 Parent(s): cc575c2

Add new SentenceTransformer model.

Browse files
Files changed (3) hide show
  1. README.md +46 -0
  2. model.onnx +2 -2
  3. tokenizer.json +14 -2
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: light-embed
3
+ pipeline_tag: sentence-similarity
4
+ tags:
5
+ - sentence-transformers
6
+ - feature-extraction
7
+ - sentence-similarity
8
+
9
+ ---
10
+
11
+ # baai-llm-embedder-onnx
12
+
13
+ This is the ONNX version of the Sentence Transformers model BAAI/llm-embedder for sentence embedding, optimized for speed and lightweight performance. By utilizing onnxruntime and tokenizers instead of heavier libraries like sentence-transformers and transformers, this version ensures a smaller library size and faster execution. Below are the details of the model:
14
+ - Base model: BAAI/llm-embedder
15
+ - Embedding dimension: 768
16
+ - Max sequence length: 512
17
+ - File size on disk: 0.41 GB
18
+ - Pooling incorporated: Yes
19
+
20
+ This ONNX model consists all components in the original sentence transformer model:
21
+ Transformer, Pooling, Normalize
22
+
23
+ <!--- Describe your model here -->
24
+
25
+ ## Usage (LightEmbed)
26
+
27
+ Using this model becomes easy when you have [LightEmbed](https://pypi.org/project/light-embed/) installed:
28
+
29
+ ```
30
+ pip install -U light-embed
31
+ ```
32
+
33
+ Then you can use the model like this:
34
+
35
+ ```python
36
+ from light_embed import TextEmbedding
37
+ sentences = ["This is an example sentence", "Each sentence is converted"]
38
+
39
+ model = TextEmbedding('BAAI/llm-embedder')
40
+ embeddings = model.encode(sentences)
41
+ print(embeddings)
42
+ ```
43
+
44
+ ## Citing & Authors
45
+
46
+ Binh Nguyen / binhcode25@gmail.com
model.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d6a5c5cb2457a07733e95b8a1480560df251127722b572f85687eb773f80e13e
3
- size 435917615
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e53797c693265afc823292398a7b01cecd75efac0fc6e86a771200e87495abeb
3
+ size 435917673
tokenizer.json CHANGED
@@ -1,7 +1,19 @@
1
  {
2
  "version": "1.0",
3
- "truncation": null,
4
- "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 0,
 
1
  {
2
  "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 256,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 0,
14
+ "pad_type_id": 0,
15
+ "pad_token": "[PAD]"
16
+ },
17
  "added_tokens": [
18
  {
19
  "id": 0,