Usage infinity
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" michaelf34/infinity:0.0.69 v2 --model-id Alibaba-NLP/gte-multilingual-base --revi
sion "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
Unable to find image 'michaelf34/infinity:0.0.69' locally
0.0.69: Pulling from michaelf34/infinity
Digest: sha256:13cdf7479c83bef5ed8887ca015b0c43dc37eb6cad3de6b8b08fd1689f6c249a
Status: Downloaded newer image for michaelf34/infinity:0.0.69
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO 2024-11-15 19:11:07,676 infinity_emb INFO: infinity_server.py:89
Creating 1engines:
engines=['Alibaba-NLP/gte-multilingual-base']
INFO 2024-11-15 19:11:07,680 infinity_emb INFO: Anonymized telemetry.py:30
telemetry can be disabled via environment variable
DO_NOT_TRACK=1
.
INFO 2024-11-15 19:11:07,689 infinity_emb INFO: select_model.py:64
model=Alibaba-NLP/gte-multilingual-base
selected,
using engine=torch
and device=cuda
INFO 2024-11-15 19:11:07,819 SentenceTransformer.py:216
sentence_transformers.SentenceTransformer
INFO: Load pretrained SentenceTransformer:
Alibaba-NLP/gte-multilingual-base
Some weights of the model checkpoint at Alibaba-NLP/gte-multilingual-base were not used when initializing NewModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO 2024-11-15 19:11:10,707 infinity_emb INFO: Adding acceleration.py:56
optimizations via Huggingface optimum.
The class optimum.bettertransformers.transformation.BetterTransformer
is deprecated and will be removed in a future release.
WARNING 2024-11-15 19:11:10,708 infinity_emb WARNING: acceleration.py:67
BetterTransformer is not available for model: <class
'transformers_modules.Alibaba-NLP.new-impl.40ced75c3
017eb27626c9d4ea981bde21a2662f4.modeling.NewModel'>
Continue without bettertransformer modeling code.
INFO 2024-11-15 19:11:11,224 infinity_emb INFO: Getting select_model.py:97
timings for batch_size=32 and avg tokens per
sentence=1
5.79 ms tokenization
12.34 ms inference
0.16 ms post-processing
18.29 ms total
embeddings/sec: 1749.85
INFO 2024-11-15 19:11:12,661 infinity_emb INFO: Getting select_model.py:103
timings for batch_size=32 and avg tokens per
sentence=512
14.79 ms tokenization
723.84 ms inference
0.18 ms post-processing
738.81 ms total
embeddings/sec: 43.31
INFO 2024-11-15 19:11:12,662 infinity_emb INFO: model select_model.py:104
warmed up, between 43.31-1749.85 embeddings/sec at
batch_size=32
INFO 2024-11-15 19:11:12,664 infinity_emb INFO: batch_handler.py:386
creating batching engine
INFO 2024-11-15 19:11:12,665 infinity_emb INFO: ready batch_handler.py:453
to batch requests.
INFO 2024-11-15 19:11:12,667 infinity_emb INFO: infinity_server.py:104
♾️ Infinity - Embedding Inference Server
MIT License; Copyright (c) 2023-now Michael Feil
Version 0.0.69
Open the Docs via Swagger UI:
http://0.0.0.0:7997/docs
Access all deployed models via 'GET':
curl http://0.0.0.0:7997/models
Visit the docs for more information:
https://michaelfeil.github.io/infinity
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
INFO: 172.17.0.1:56160 - "GET /docs HTTP/1.1" 200 OK
INFO: 172.17.0.1:56160 - "GET /openapi.json HTTP/1.1" 200 OK
INFO: 172.17.0.1:58364 - "POST /embeddings HTTP/1.1" 200 OK