Update infinity example

#23

Added:

infinity_emb

Usage via infinity, MIT Licensed.

docker run \
--gpus "0" -p "7997":"7997" \
michaelf34/infinity:latest \
v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer
michaelfeil changed pull request status to closed
michaelfeil changed pull request status to open
michaelfeil changed pull request status to closed
michaelfeil changed pull request status to open
docker run --gpus "0" -p "7997":"7997" michaelf34/infinity:latest v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer


INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-14 05:18:36,657 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['dunzhang/stella_en_400M_v5']                                 
INFO     2024-11-14 05:18:36,662 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-14 05:18:36,670 infinity_emb INFO:           select_model.py:64
         model=`dunzhang/stella_en_400M_v5` selected, using                     
         engine=`torch` and device=`cuda`                                       
INFO     2024-11-14 05:18:36,936                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         dunzhang/stella_en_400M_v5                                             
Some weights of the model checkpoint at dunzhang/stella_en_400M_v5 were not used when initializing NewModel: ['new.pooler.dense.bias', 'new.pooler.dense.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO     2024-11-14 05:19:21,174                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 2 prompts are loaded, with the keys:                             
         ['s2p_query', 's2s_query']                                             
/app/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1141: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO     2024-11-14 05:19:21,795 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=16 and avg tokens per                           
         sentence=1                                                             
                 4.10     ms tokenization                                       
                 23.05    ms inference                                          
                 0.17     ms post-processing                                    
                 27.32    ms total                                              
         embeddings/sec: 585.72                                                 
INFO     2024-11-14 05:19:23,600 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=16 and avg tokens per                           
         sentence=512                                                           
                 12.33    ms tokenization                                       
                 906.90   ms inference                                          
                 0.48     ms post-processing                                    
                 919.72   ms total                                              
         embeddings/sec: 17.40                                                  
INFO     2024-11-14 05:19:23,604 infinity_emb INFO: model    select_model.py:104
         warmed up, between 17.40-585.72 embeddings/sec at                      
         batch_size=16                                                          
INFO     2024-11-14 05:19:23,607 infinity_emb INFO:         batch_handler.py:386
         creating batching engine                                               
INFO     2024-11-14 05:19:23,609 infinity_emb INFO: ready   batch_handler.py:453
         to batch requests.                                                     
INFO     2024-11-14 05:19:23,613 infinity_emb INFO:       infinity_server.py:104
                                                                                
         ♾️  Infinity - Embedding Inference Server                               
         MIT License; Copyright (c) 2023-now Michael Feil                       
         Version 0.0.69                                                         
                                                                                
         Open the Docs via Swagger UI:                                          
         http://0.0.0.0:7997/docs                                               
                                                                                
         Access all deployed models via 'GET':                                  
         curl http://0.0.0.0:7997/models                                        
                                                                                
         Visit the docs for more information:                                   
         https://michaelfeil.github.io/infinity                                 
                                                                                
                                                                                
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
infgrad changed pull request status to merged

Sign up or log in to comment