Xenova/all-MiniLM-L6-v2 · Running on webgpu?

Apr 4, 2024

•

edited Apr 4, 2024

Any suggestions for running the model with webgpu backend?
Additionally, how can I log the nodes and their assigned execution provider. Thank you in advance!

I've seen the model being used in the benchmarking demo https://huggingface.co/spaces/Xenova/webgpu-embedding-benchmark comparing wasm and webgpu
When trying to run it on webgpu (see code excerpt below), I get the following warning/error ort-wasm-simd.jsep.js: [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

  <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.webgpu.min.js"></script>
  <script type="module">
    import { AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.16.0';

    async function runInference() {
      const session = await ort.InferenceSession.create('models/Xenova/all-MiniLM-L6-v2/onnx/model_fp16.onnx', { executionProviders: ['webgpu'], log_severity_level: 0 });
      const tokenizer = await AutoTokenizer.from_pretrained('Xenova/all-MiniLM-L6-v2');

      const texts = ['This is an example sentence', 'Each sentence is converted']

      const { input_ids, token_type_ids, attention_mask } = tokenizer(texts, {
        padding: true,
        truncation: true,
        return_tensors: 'pt'
      });

      const feeds = {
        input_ids: new ort.Tensor('int64', input_ids.data, [1, input_ids.data.length]),
        token_type_ids: new ort.Tensor('int64', token_type_ids.data, [1, token_type_ids.data.length]),
        attention_mask: new ort.Tensor('int64', attention_mask.data, [1, attention_mask.data.length])
      };


      const result = await session.run(feeds);

...

Xenova

Owner Apr 4, 2024

Well, does it run? :) You should be able to ignore that warning, since I believe those are shape-related ops.

oanaflores

Apr 5, 2024

Yes, it runs! Thank you for the prompt response