Alan Joshua commited on
Commit
8c9803b
Β·
verified Β·
1 Parent(s): 6001257

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -8,6 +8,29 @@ tags:
8
  license: mit
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  # BiEncoder RoPE β€” Sentence Embedding Model
12
 
13
  A 34M parameter sentence embedding model trained from scratch using PyTorch.
 
8
  license: mit
9
  ---
10
 
11
+ ```python
12
+ import onnxruntime as ort
13
+ import numpy as np
14
+ from transformers import AutoTokenizer
15
+ from huggingface_hub import hf_hub_download
16
+
17
+ # ── Load ───────────────────────────────────────────────────────────────────
18
+ tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer")
19
+ onnx_path = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx")
20
+ session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
21
+
22
+ # ── Encode ─────────────────────────────────────────────────────────────────
23
+ def encode(texts):
24
+ if isinstance(texts, str): texts = [texts]
25
+ enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
26
+ return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
27
+
28
+ # ── Test ───────────────────────────────────────────────────────────────────
29
+ emb = encode("Hello world!")
30
+ print(emb) # (1, 256)
31
+ ```
32
+
33
+
34
  # BiEncoder RoPE β€” Sentence Embedding Model
35
 
36
  A 34M parameter sentence embedding model trained from scratch using PyTorch.