dunzhang/stella_en_400M_v5 · Update infinity example

10 days ago

•

Added:

infinity_emb

docker run \
--gpus "0" -p "7997":"7997" \
michaelf34/infinity:latest \
v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer

Update infinity example1ef9639c

michaelfeil changed pull request status to closed 10 days ago

michaelfeil changed pull request status to open 8 days ago

Update README.md8342a640

michaelfeil changed pull request status to closed 8 days ago

michaelfeil changed pull request status to open 8 days ago

michaelfeil

8 days ago

docker run --gpus "0" -p "7997":"7997" michaelf34/infinity:latest v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer


INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-14 05:18:36,657 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['dunzhang/stella_en_400M_v5']                                 
INFO     2024-11-14 05:18:36,662 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-14 05:18:36,670 infinity_emb INFO:           select_model.py:64
         model=`dunzhang/stella_en_400M_v5` selected, using                     
         engine=`torch` and device=`cuda`                                       
INFO     2024-11-14 05:18:36,936                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         dunzhang/stella_en_400M_v5                                             
Some weights of the model checkpoint at dunzhang/stella_en_400M_v5 were not used when initializing NewModel: ['new.pooler.dense.bias', 'new.pooler.dense.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO     2024-11-14 05:19:21,174                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 2 prompts are loaded, with the keys:                             
         ['s2p_query', 's2s_query']                                             
/app/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1141: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO     2024-11-14 05:19:21,795 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=16 and avg tokens per                           
         sentence=1                                                             
                 4.10     ms tokenization                                       
                 23.05    ms inference                                          
                 0.17     ms post-processing                                    
                 27.32    ms total                                              
         embeddings/sec: 585.72                                                 
INFO     2024-11-14 05:19:23,600 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=16 and avg tokens per                           
         sentence=512                                                           
                 12.33    ms tokenization                                       
                 906.90   ms inference                                          
                 0.48     ms post-processing                                    
                 919.72   ms total                                              
         embeddings/sec: 17.40                                                  
INFO     2024-11-14 05:19:23,604 infinity_emb INFO: model    select_model.py:104
         warmed up, between 17.40-585.72 embeddings/sec at                      
         batch_size=16                                                          
INFO     2024-11-14 05:19:23,607 infinity_emb INFO:         batch_handler.py:386
         creating batching engine                                               
INFO     2024-11-14 05:19:23,609 infinity_emb INFO: ready   batch_handler.py:453
         to batch requests.                                                     
INFO     2024-11-14 05:19:23,613 infinity_emb INFO:       infinity_server.py:104
                                                                                
         ♾️  Infinity - Embedding Inference Server                               
         MIT License; Copyright (c) 2023-now Michael Feil                       
         Version 0.0.69                                                         
                                                                                
         Open the Docs via Swagger UI:                                          
         http://0.0.0.0:7997/docs                                               
                                                                                
         Access all deployed models via 'GET':                                  
         curl http://0.0.0.0:7997/models                                        
                                                                                
         Visit the docs for more information:                                   
         https://michaelfeil.github.io/infinity                                 
                                                                                
                                                                                
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)

infgrad changed pull request status to merged 8 days ago