Tuan Tran

tuantm

AI & ML interests

None yet

Recent Activity

liked a model 18 days ago

reducto/RolmOCR

liked a model 21 days ago

lmstudio-community/openhands-lm-32b-v0.1-GGUF

liked a model about 1 month ago

amd/Instella-3B-Instruct

View all activity

Organizations

tuantm's activity

liked a model 18 days ago

reducto/RolmOCR

Image-Text-to-Text • Updated 20 days ago • 35.6k • 386

liked a model 21 days ago

lmstudio-community/openhands-lm-32b-v0.1-GGUF

Text Generation • Updated 22 days ago • 3.53k • 9

liked a model about 1 month ago

amd/Instella-3B-Instruct

Text Generation • Updated 25 days ago • 931 • 50

liked 4 models 3 months ago

liked a model 4 months ago

nampham1106/snowflake-arctic-embed-m-v2.0

liked a model 5 months ago

sail/Sailor2-20B

Text Generation • Updated Feb 20 • 12 • 10

liked a dataset 5 months ago

microsoft/orca-agentinstruct-1M-v1

Viewer • Updated Nov 1, 2024 • 1.05M • 4.49k • 436

liked a model 5 months ago

tablegpt/TableGPT2-7B

Updated Feb 13 • 9.74k • 198

liked 4 models 6 months ago

facebook/MobileLLM-1B

Text Generation • Updated about 3 hours ago • 4.93k • 121

facebook/MobileLLM-125M

Text Generation • Updated about 3 hours ago • 1.56k • 112

HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1

Marqo/marqo-ecommerce-embeddings-L

Updated Nov 12, 2024 • 117k • 35

reacted to tomaarsen's post with 🔥 6 months ago

Post

7139

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉

🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

1 reply

liked a model 6 months ago

CohereLabs/aya-expanse-8b

Text Generation • Updated 7 days ago • 21.6k • • 360

liked a model 7 months ago

facebook/MEXMA

liked a dataset 7 months ago

lakritidis/product-matching

Viewer • Updated May 9, 2024 • 35.3k • 38 • 2

upvoted a paper 7 months ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 63