Upload 4 files

#17
by coreprinciple - opened

Add BGE-M3 (BAAI/bge-m3)

Model: BAAI/bge-m3
Architecture: XLM-RoBERTa (large) β€” 568M parameters
Embedding dimensions: 1024
Max sequence length: 8192 tokens
Languages: 100+

Conversion

Converted using optimum-cli export onnx --model BAAI/bge-m3 --task feature-extraction.
Validated ONNX output against PyTorch: cosine similarity > 0.9999 across English,
French, and Chinese test sentences.

Local testing

Tested with Typesense 29.0 via Docker:

  • Collection creation
  • Document indexing with auto-embedding
  • Semantic search (English)
  • Cross-lingual semantic search (French query β†’ English results)

Config

model_type: xlm_roberta
vocab_file_name: sentencepiece.bpe.model

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment